Auto Tag

Auto Tag image files with AI

Xiaoye Chen

Apr 21, 2024 — 3 min read

If you have a large collection of images, retrieving the right file when you need it is challenging. Tagging the images effectively will greatly help, but this can be time-consuming. Won't it be nice if AI can help us with the tagging? This is exactly what Ritt's Auto Tag feature does.

How to use Auto Tag?

Say we have a collection of interior design image files, and we wish to tag them by room type (e.g. living room, kitchen) and style (e.g. industrial, Scandinavian). You can make use of Auto Tag by following these three simple steps:

Drag and drop potential tags to the Auto Tag box in groups. For example, all room tags should be dragged as a group, and style tags as a separate group. It helps if the Multiselect mode is enabled.
Select all files to be tagged.
Click on Auto Tag

0:00

/0:22

The first time Auto Tag is used on a file, it will take some time to process it (it is calculating the image embedding). On a typical laptop CPU, Auto Tag can process about three images per second. Once this is done, Ritt caches this result and will recall it when the same file is Auto Tagged again.

Accuracy

So, how well does Auto Tag perform? Here is a random sampling of 20 files Auto Tagged to "Living room". Out of these, it got 15 correct, 2 inconclusive (?) and 3 wrong. In short, an accuracy of approximately 80%.

Living (Auto Tag result): 80%

For a random sample of 20 "Kitchen" images, Auto Tag performed even better, scoring 100% in accuracy.

Kitchen (Auto Tag result): 100%

Styles are by nature a lot more subjective, so it is difficult to judge the accuracy of style tags. Here are the results for the "Industrial" and "Scandinavian" tags.

Industrial (Auto Tag result)

Scandinavian (Auto Tag result)

Group vs Individual modes

In this example, we use Auto Tag in Group mode. In this mode, the suitability of a tag is judged relative to others in the group. This mode works best when all (or most) images fall under one or more tags in the group.

In Individual mode, the suitability of each tag is judged individually. To save time, you can still drag and drop potential tags as a group.

Under the hood

Ritt's Auto Tag uses CLIP, an OpenAI model for zero-shot image classification, where the tag names (and aliases) are the labels. This approach generally works well for tagging based on what an image is. However, it is not great at picking out fine details of what is in an image. Other models might be included in Auto Tag in the future to complement CLIP.

Future development

Ritt is still under active development and we have several ideas to improve Auto Tag in future versions.

Include zero-shot object detection models to complement CLIP
Facial recognition models to help tag photos of people

Conclusion

Auto Tag can be a huge time-saver when dealing with many images. Depending on the type of tag, its accuracy can be 80% or higher.

If you have any comments or suggestions, please join our Discord.