Hierarchical keywording in Lightroom can make a mess; now we can clean it up in Photo Mechanic
A reader sent in a super-clever method, using Photo Mechanic’s Find and Replace dialog and its ability to use Grep and regular expressions, to get rid of the mess that Lightroom can make in our keywords. This post is a sidebar to my post on using Photo Mechanic’s Structured Keywords feature. (Keywording in Photo Mechanic Part 2)
See Keywording – Considerations Before you Start here; Keywording in Photo Mechanic Part 1here; and Keywording in Photo Mechanic Part 2 here.
(Note that this sidebar post was written in March 2021. I have set the publish date back to match the publication of the original post so that they will appear together in my post listings.)
In that post, we talked about how Photo Mechanic protects you from rogue programs that misuse Lightroom’s proprietary hierarchical keywords XMP field. That can mean some ugly-looking keywords after Photo Mechanic edits metadata on a record that had keywords assigned hierarchically in Lightroom. You can refer back to that post to check it out.
What happens is that you end up with a bunch of duplicate keywords. Or what looks like duplicate keywords, anyway.
Normally, this is only an aesthetic problem, because duplicate keywords don’t do any harm, apart from wasting some resources.
… BUT ….
Lightroom has a feature called “non-exporting” keywords. You can use this feature as a hack to make category labels in your Lightroom keyword list. And many Lightroom users have done so. But it’s a kludge. Lightroom doesn’t know that these non-exporting keywords are just labels. It thinks they are real keywords.
If you actually export an image that has non-exporting keywords (used as labels), well, they won’t export and you’re fine.
However, if you use the “Save Metadata to File” function in Lightoom to write your metadata into your original files (or their sidecars), as you probably will, you will then have these darn things written into your keywords.
Most DAM systems will read the labels/non-exporting keywords as keywords and return unexpected search results.
(Including Photo Mechanic Plus, which didn’t exist when the previous post was originally written, but subsequently has become the gold standard for photographers’ DAMs. Lightroom too, for that matter. Whether or not you embed, the bogus keywords will be in Lightroom’s database. They will affect search results in Lightroom itself. )
Ouch!
If you have this problem, you very well might need to clean up your keywords.
Reader Graham Prentice to the rescue.
Graham’s idea is to make Photo Mechanic understand that the Lightroom’s hierarchical keywords really are duplicates, so Photo Mechanic can then automatically eliminate them.
We can use Find and Replace in Photo Mechanic to delete the bits and pieces that keep Photo Mechanic from recognizing the nonsense as duplicates and we’re on our way. If we have some non-exporting keywords used as labels, those we’ll need to recognize and just plain delete.
The simple version of this method is just to use Find and Replace straight-up, the way we are used to.
I exported some test files from Lightroom. My untidy keywords looked like this: “Flowers, Garden, family, home, family | home | Garden | Flowers” (without the quotes, of course) The ones I wanted to get rid of were in the form “keyword-space-pipe-space-and-another-keyword”.
All I needed to find was “space-pipe” and replace it with a comma. So that “…family, home, family | home…” became “…family, home, family, home…”. Which Photo Mechanic would deal with automatically, leaving just “…family, home…” We can think of this as a simple proof of the concept.
Remember to backup your images before you do this!
Now again, problem keywords like the ones I fixed in the paragraphs above aren’t really such a problem at all. They’re just ugly. And who spends time looking at their keywords? You could leave them be with a clean conscience.
If we have only a few of those non-exporting-keywords-used-as-category-labels to deal with, we can zap them the same way. Find and delete. Repeat for each label and we’re done.
But what if we have dozens of them? Or if we don’t have a way of knowing exactly what they are so we can search for them?
Well, enter Graham’s regular expression-based method. Using regular expressions allows us to search for multiple values, or “anything that begins with or ends with this or that”, and Hoover them up in one go. Brilliant. And, given that I’m not good with regex, I wouldn’t have thought of it. Didn’t, in fact.
I tested it. It works! It’s a testament to what you can accomplish with the (pretty darn amazing) power built into Photo Mechanic.
Several authors on the internet have suggested delimiting non-exporting keyword “labels” in Lightroom with square brackets. That’s what Graham did. That’s good because if the labels are not delimited in some way, we’re back at Find and Replace for each one, one at a time.
Note that when Graham refers to the “spacer bar” symbol, he’s talking about the pipe character, which is above the carriage return key on most keyboards.
Here’s Graham’s tip, just as he sent it in:
Comment:
A tip for tidying up the keywords duplicated by Lightroom. You need to get rid of the duplicates, which Photo Mechanic does for you if you replace the spacer bar in Lightroom’s output with a comma (or semi-colon if this is your keyword separator). You can easily do this with Find and Replace using Grep.
I have all my non-exporting Lightroom categories enclosed in square brackets.
If you use Find and Replace, with “Grep” and “Treat repeating fields as a single string” both checked, you can tidy up the keywords immediately. The Find box string is ([\|]{1})|(\[(.*?)\]) The Replace box should contain a single comma. Make sure that only Keywords are selected in the list of metadata to be affected.
Running Find and Replace, which can be on as many images as you care to select at once, with differing keywords if you wish, will delete all text within square brackets (including the brackets themselves) and replace all separators with commas. The result is that the keywords are duplicated, but Photo Mechanic automatically deletes the duplicates, leaving you with a single complete set of keywords for each image, comma separated, without the additional messy hierarchical strings introduced by Lightroom.
Note that this means you now have a single set of flat keywords, but Photo Mechanic does not not add these back into its Structured Keywords panel. Your structured keywords in Photo Mechanic are therefore untainted by the output from Lightroom. If you later update the metadata for the image(s) in Lightroom, the keywords will show up in Lightroom’s catalogue as additional flat keywords. This kind of forces you to commit to one or the other for maintaining your DAM, as it now gets messy in Lightroom, but PM seems to me to be better for keywords. You unfortunately cannot get away from Lightroom’s catalogue for retaining Lightroom’s edits, but you do not have to use it for keywords.
Graham
Safety first
You should be sure to try any of these ideas first on a small selection of test images before you commit to using them in bulk. I just copy some images from my DAM into a folder and experiment on those.
Graham has some thoughts on that. He has written another comment that addresses testing in more detail and offers a Grep string that will work if tildes (or some other character) are used as delimiters. See his additional comment on the structured keywords post here.
My readers are the best. Thank you Graham!
Keywording in Lightroom is, well, not the best. Now that Photo Mechanic Plus is on the market, I suspect that there will be many photographers migrating their digital asset management away from Lightroom. Some can simply ignore the noise caused by Lightroom’s silly proprietary keywords field. But for others, cleaning it up will be a priority. Now we have a way to do it.
Jump in the comments. Let us know how this method worked for you, or if you have other data migration issues going from Lightroom, or any other software, to a more standards-based, professional cataloging solution and how you solved them.