Any Vision

Lightroom Plugin


Any Vision automatically tags your photos with labels, landmarks, logos, face expressions, and dominant colors and extracts embedded text (OCR). You can search these tags in Lightroom, making it much easier to find photos in large catalogs. You can export the tags in photo metadata as keywords and GPS locations or in comma-separated text files. And you can translate the tags to more than 100 different languages.

Any Vision uses Google Cloud Vision, the state-of-the-art machine-learning technology underlying Google image search. Though you'll have to get a Google Cloud key as well as an Any Vision license, Google's pricing lets you analyze for free up to 200,000 photos in the first 60 days and then up to 1,000 photos every month thereafter.

Consider similar services with different features and pricing.

Try it for free (limited to 50 photos). Buy a license at a price you name.

Examples

Here's an example showing labels, landmarks, face expression, and recognized text; Cloud Vision has correctly located the photo within a few meters and extracted much of the visible text from the store signs and the granite plaque on the statue:

Here's an example of logo detection, where the logos are partially obscured but still recognized:

And here's an example showing correctly recognized jersey numbers:

Download and Install

Any Vision requires Lightroom 5.7 / CC 2015 or later.

  1. Download anyvision.1.4.zip. (What's changed in this version)

  2. If you're upgrading from a previous version of Any Vision, exit Lightroom and replace the existing anyvision.lrplugin folder with the new one extracted from the downloaded .zip. Restart Lightroom and you're done.

  3. If this is a new installation, extract the folder anyvision.lrplugin from the downloaded .zip and move it to a location of your choice.

  4. In Lightroom, do File > Plug-in Manager.

  5. Click Add, browse and select the anyvision.lrplugin folder, and click OK (Windows) or Add Plug-in (Mac OS).

The free trial is limited to analyzing 50 photos—after that, you'll need to buy an Any Vision license and a Google Cloud key.

Licensing

To use Any Vision after the free trial ends, you'll need both an Any Vision plugin license and a Google Cloud key tied to a billing account you set up with Google. Cloud Vision costs little or nothing for most users—see Cloud Vision Pricing for details.

Buy a License  

  1. Buy a license at a price you think is fair: Add to Cart
    The license includes unlimited upgrades. Make sure you're satisfied with the free trial before buying.

  2. Copy the license key from the confirmation page or confirmation email.

  3. Do Library > Plug-in Extras > Any Vision > Analyze.

  4. Click Buy.

  5. Paste the key into the License key box and click OK.

Get a Google Cloud Key 

Setting up a Google billing account and getting a Cloud key is a little tedious but straightforward if you follow these steps exactly. I recommend that you print this or arrange arrange two browser windows to be visible. (Unfortunately, Google doesn't provide any way for an application like Any Vision to make this simpler.)

  1. In a browser, go to console.cloud.google.com.

  2. Create a Google account or sign in with an existing one (for example, your Gmail account).

  3. Agree to the Terms of Service.

  4. Under "Try Google Cloud Platform for free", click "Sign up":

  5. Agree to the Google Cloud Platform Free Trial Terms of Service.

  6. Enter your billing information.

  7. In "Welcome name!", click "GOT IT".

  8. Click the menu button in the upper left and select "APIs & services":

  9. In the left column, click "Library":

  10. In the "Search for APIs & services" box, type "cloud vision", and then click on "Google Cloud Vision API":

  11. Click "ENABLE":

  12. In the left column, click "Library":

  13. In the "Search for APIs & services" box, type "cloud translation", and then click on "Google Cloud Translation API":

  14. Click "ENABLE":

  15. In the left column, click "Credentials":

  16. Click "Create credentials", then "API key":

  17. Copy the API key by clicking the copy button:

  18. In Lightroom, do Library > Plug-in Extras > Any Vision > Analyze and click Google Key at the bottom:

  19. Paste the key into Google key and click OK:

Google Cloud Vision Pricing (as of February 2017)

Google charges your billing account for each photo you analyze. There are no upfront charges, and you can disable your account at any time.

Each of the seven features (Labels, Landmarks, Logos, Faces, Safety, Text, and Dominant Color) costs $0.0015 / photo, except for Safety, which is free if you also select Labels. For example, selecting Labels and Landmarks costs $0.003 / photo ($3.00 / 1000 photos), and selecting all seven features costs $0.009 / photo ($9.00 / 1000 photos).

Translation of labels and other features to other languages costs $20 per million characters or about 100,000 distinct labels. (My main catalog has 30,000 images with 2500 distinct labels, and translating them to another language cost about $0.50.)

Though this may sound expensive for larger catalogs, Google provides incentives that lower the cost considerably. Most users will pay nothing or very little every month.

The first $1.50 (1000 features) is free each month. For example, if you select just Labels and Landmarks, you can analyze 500 photos / month for free.

Google also offers $300 free credit for creating your first billing account (the credit must be used within 60 days). If you select just one feature (e.g. Labels), that's enough for 200,000 photos, or nearly 29,000 photos for all seven features.

Using Any Vision

Select the photos to be analyzed and do Library > Plug-in Extras > Any Vision > Analyze. Select the features you want to tag and click OK:

Any Vision exports reduced-size versions of the photos, sends them to Google, and processes the results. Typically it takes about 4 seconds per photo (more if you have a slower Internet connection). See Advanced for how to reduce this to 1 second per photo, at the expense of making Lightroom less usable interactively while Analyze is running.

Once you've analyzed a photo, by default Any Vision won't reanalyze it. So if you change the set of selected features, doing Analyze again won't have any effect. See Advanced for how to force Any Vision to resend photos to Google to be reanalyzed (at additional cost).

You can see the results in the Metadata panel (in the right column of Library) with the Any Vision tagset:

Following each label and landmark is a numeric score, e.g. "mountain (85)", indicating Google's estimate of the likelihood of that label or landmark. Faces and safety terms have bucketed scores ranging from "very unlikely" to "very likely".

Any Vision also assigns hierarchical keywords using this hierarchy:

For example, if the photo has the label "mountain", then the keyword Any Vision > Labels > mountain is assigned to the photo.

Features

Labels are objects, activities, and qualities, such as mountain, tabby cat, toddler, safari, road cycling, rock climbing, white. In my main catalog of nearly 30,000 photos, Cloud Vision recognized 2500 distinct labels.

Landmarks are specific locations of where the photo was taken or of objects in the photo. Examples include Paris, Eiffel Tower, Denali National Park, Salt Lake Tabernacle Organ, Squaw Valley Ski Resort, Kearsarge Pass. But landmarks aren't necessarily famous or well-known—they can be obscure local landmarks, such as statues or waterfalls. Photos may be tagged with more than one landmark. In my catalog, Cloud Vision recognized 500 distinct locations.

Each landmark has a GPS location, and the arrow button to the right of the Map field will open Google Maps on the first (most likely) landmark location in the photo.

Logos are product or service logos, such as Coca-Cola, Office Depot, SpongeBob SquarePants, Tonka. In my catalog, Cloud Vision recognized 110 distinct logos.

Faces: Cloud Vision identifies the "sentiment", or expression, of each recognized face: Joy, Sorrow, Anger, Surprise. It may also tag a face as Under Exposed, Blurred, or wearing Headwear.

Safety identifies whether the photo is "safe" for Google image search: Adult, Spoof, Medical, and Violence. In my main catalog of 30,000 photos, only 213 received one of these safety tags, and most of them weren't very accurate. Exposed skin triggers "adult", regardless of whether it’s a bikini-clad woman, a shirtless teen, or babies in diapers.

Text contains text recognized in photos using optical character recognition (OCR). Cloud Vision appears to do a reasonable job of recognizing text on signs, plaques, athletic jerseys, etc.

Dominant Color. Cloud Vision identifies the ten most "dominant" colors in a photo, using an undocumented algorithm. Use the Sort by Color command to see those colors for a photo and find other photos with similar colors.

Searching

You can search photos' features using the Library Filter bar, smart collections, or the Keyword List. For example, to find photos assigned the label "mountain", you could do:

Do Library > Enable Filters if the Library Filter bar isn't showing, and then click Text.

To search just the Labels field, use this smart-collection criterion:

Alternatively, in the Keyword List panel, type "mountain" in the Filter Keywords box, then click the arrow to the far right of the "mountain" keyword:

Advanced

The Advanced tab provides more flexibility for using Any Vision:

Score Threshold: For each label, landmark, logo, etc. Cloud Vision assigns a score, an estimate of the probability it is correct. You can set a per-feature score threshold, and only those labels, landmarks, etc. that have at least that score will be assigned to the photo.

Labels, landmarks, and logos have scores from 0 to 100, though in practice, Cloud Vision returns only those with a score of at least 50. Faces and Safety values have bucketed scores ranging from "very unlikely" to "very likely".

Assign Keywords: If checked, Any Vision assigns a keyword for each extracted feature. For example, if the photo has the label "mountain", then the keyword Any Vision > Labels > mountain is assigned to the photo. You can enable or disable this on a per-feature basis.

Text (OCR) copy: Recognized text can be copied from the Text field in the Metadata panel to one of the standard IPTC fields Caption, Headline, Title, or Source. copy always copies the text, copy if empty copies the text only if the destination IPTC field is empty, append appends the text to the end of the destination IPTC field, and don't copy never copies the text.

Text (OCR) pattern replacement: Recognized text can be transformed using patterns whose syntax is documented here. For example, to extract just numbers from the recognized text, placing one number per line:

Replace pattern: [0-9]+ with: %0 separator: \n

To extract the first number only:

Replace pattern: ^.-([0-9])+.*$ with: %1

Include scores in fields: If checked, then the score will be included with each extracted feature, e.g. "mountain (83)" or "Eiffel Tower (94)".

Set "Include on Export" attribute of keywords: If checked, then each new keyword created by Any Vision will have the Include on Export attribute set, allowing the keyword to be included in the metadata of exported photos.

If you change this setting and want the change to be applied retroactively to all Any Vision keywords: Delete the root Any Vision keyword in the Keyword List panel. Select all the photos you've previously analyzed. Do Analyze, click Advanced, and select the option Reassign metadata fields. This will recreate all the keywords with the new setting, without actually sending the photos to Cloud Vision for reanalysis.

Use subgroups (A, B, C, …) for Labels, Landmarks, and Logos keywords: When checked, Any Vision will create subkeywords A, B, C, … under the parent keywords Any Vision > Labels, Any Vision > Landmarks, and Any Vison > Logos:

For example, the keyword for the label "mountain" would be placed under Any Vision > Labels > M.

This works around a longstanding (and shameful) Lightroom bug on Windows where it chokes if it tries to display more than about 1500 keywords at once.

Copy landmark location to GPS field: Each landmark assigned to a photo by Cloud Vision has an associated latitude/longitude, displayed in the Location field in the Metadata panel. Selecting Always or When GPS field is empty copies the latitude/longitude of the first landmark (the one with the highest score) to the EXIF GPS field. Once the GPS field is set, the photo will appear on the map in the Map module, and Lightroom will do address lookup to automatically set the photo's Sublocation, City, State / Province, and Country.

Previously analyzed photos: This option tells Any Vision how to handle selected photos that have been previously analyzed:

Skip ignores such photos.

Reanalyze by sending to Google sends the photos to Cloud Vision for reanalysis (and additional cost)—you must choose this if you've added a feature to be analyzed.

Reassign metadata fields reassigns the Any Vision metadata fields and keywords using the previous analysis but the current options. This is useful if you've changed any of the options that control how the Any Vision metadata fields are set, such as Assign Keywords or Include scores in fields. This option doesn't incur additional costs for previously analyzed photos.

Concurrently processed photos: This is the number of photos that will be processed in parallel by Any Vision and Cloud Vision. The default value of 1 will have the least impact on interactive use of Lightroom (though it could still be a little jerky). The maximum value of 6 processes photos about 4 times faster, though interactive use will likely be very jerky. (In my testing, larger values didn't provide any more speedup.)

Translation

By default, labels and other features are returned by Google in English, but the Translation tab lets you translate them to another language. You can specify which features should be translated; features not selected will remain in English.

Any Vision uses Google Cloud Translation, the same technology behind Google Translate. More than 100 languages are supported.

Translation does cost additional, but it is quite inexpensive. Any Vision remembers previous translations, so you only pay once for each distinct word or phrase.

You can override the translations of specific words and phrases with an overrides dictionary. Click Edit Overrides to open Finder or File Explorer on the dictionary for the current language (e.g. de.csv for German). The dictionary is in UTF-8 CSV format (comma-separated values), and after the header each following line contains a pair of phrases:

word or phrase in English, word or phrase in target language

The words and phrases are case-senstive. Make sure you save the file in UTF-8 format:

Excel: Do File > Save As, File Format: CSV UTF-8.

TextEdit (Mac): After opening the file, change it to plain text via Format > Make Plain Text. It will save in UTF-8 format.

Notepad (Windows): Do File > Save As, Encoding: UTF-8.

Sort by Color

When the Dominant Color feature is selected, Cloud Vision finds the ten most "dominant" colors in a photo, using an undocumented algorithm. You can see those colors by selecting an analyzed photo and doing Library > Plug-in Extras > Sort by Color:

To find other photos containing a similar dominant color, select all the photos you want to search and invoke Sort by Color. Choose one of the dominant colors of the most-selected photo, or use the color picker at the bottom left to choose another color, and click OK. The current source is changed to the collection Any Vision: Sorted by Color, containing those photos with the most similar dominant colors. Do View > Sort > Custom Order to sort the collection by similarity.

Here are the photos from my main catalog of 30,000 photos with dominant colors closest to the orange-brown chosen in the example photo above:

As another example, here are photos from my catalog labeled "sunset" by Cloud Vision,  sorted by similarity to the yellow-orange from the first photo:

Export to File

To export the analyzed metadata fields to a comma-separated (CSV) text file, one row per photo, select one or more analyzed photos and do Library > Plug-in Extras > Export to File. Open the file in Excel or another spreadsheet program.

Similar Services

You may wish to consider these alternative products:

Cloud Tagger also uses Google Cloud Vision, but it is intended as an aid to careful keywording of smaller numbers of photos rather than searching large catalogs. It currently has no recurring charges.

Excire is a Lightroom plugin with capabilities similar to Cloud Vision. But it doesn't use the cloud and it has a higher one-time upfront cost with no recurring charges.

MyKeyworder is intended as an aid to careful keywording of smaller numbers of photos rather than searching large catalogs.  It has recurring charges.

Adobe provides search for Lightroom Web using similar technology, but it doesn't search your desktop catalog, just photos you've synced with Lightroom Web and Mobile. Adobe's upcoming Project Nimbus will likely have this search capability as well, but there is no indication it will be available to Lightroom Desktop.

Keyboard Shortcuts

Windows: You can use the standard menu keystrokes to invoke Any Vision > Analyze. ALT+L opens the Library menu, U selects the Plug-in Extras submenu, and A invokes the Any Vision > Analyze command.

To assign a single keystroke as the shortcut, download and install the free, widely used AutoHotkey. Then, in the File Explorer, navigate to the plugin folder anyvision.lrplugin. Double-click Install-Keyboard-Shortcut.bat and restart your computer. This defines the shortcut Alt+A to invoke the Analyze command. To change the shortcut, edit the file Any-Vision-Keyboard-Shortcut.ahk in Notepad and follow the instructions in that file.

Mac OS: You can use the standard mechanism for assigning application shortcuts to plugin menu commands. In System Preferences > Keyboard > Keyboard Shortcuts > Application Shortcuts, select Adobe Lightroom. Click "+" to add a new shortcut, in Menu Title type "Analyze" (case matters) preceded by three spaces ("<space><space><space>Analyze"). In Keyboard Shortcut type the desired key or key combination.

Support

Please send problems, bugs, suggestions, and feedback to

I'll gladly provide free licenses in exchange for reports of new, reproducible bugs.

Known limitations and issues:

  • Any Vision requires Lightroom 5.7 or later—it relies on features missing from earlier versions.

  • Cloud Vision infrequently returns errors such as "The request timed out" and "Image processing error!". If this occurs, just rerun Analyze. I don't know why these errors occur—they appear spurious.

  • If you upgrade from versions 1.2 or 1.3 and get the error, "Problem getting supported languages. The Google Cloud Translation API...", you'll need to enable the Translation API. In your browser, go to console.cloud.google.com, log in if necessary, and follow steps 12–14 of Get a Google Cloud Key.

Version History

1.2
  • Initial release.
1.3
  • The Use subgroups for keywords option now correctly handles keywords starting with non-English characters.
  • Recognized text can optionally be copied to an IPTC field.
1.4
  • Better handling of errors reported by Google, such as an expired credit card.
  • Pattern replacement for transforming recognized text, e.g. to extract jersey numbers.
  • Translation of features to other languages. See Support for how to enable the Google Cloud Translation API.

Copyright 2017 John R. Ellis