Automatic Alternative Text (AAT) that generates photo descriptions for visually impaired users on Facebook can now detect 10x more concepts in a photo and outline more details with the help of AI.
As several photos are tagged without alt text, AAT uses object recognition to produce it, and now with advanced AI, Facebook can identify activities, landmarks, types of animals, and give out detailed descriptions.
Visually impaired users can experience imagery through a screen reader that describes a photo using the alternative text it is tagged with. The improvements in AI technology would result in lesser photos without descriptions.
Positional locations and the relative size of elements can also be detected. For instance, Facebook will also be able to include information such as where the people in the photo are standing (left, right, center, scattered) and relative sizes of two objects in a photo, such as a car and a mountain, and how large the mountain appears in comparison.
Users can get short descriptions for all photos but detailed descriptions on photos of specific interest, such as photos from friends and family. Detailed descriptions include positional information, prominence of objects described as primary, secondary, or minor. Alt-text descriptions are available in 45 different languages.
To develop the improved technology, The platform started off with supervised learning with human-labeled data to develop the AAT technology then trained a model on weakly-supervised data in the form of billions of public Instagram images and their hashtags.
Then fine-tuned with different concepts across geographies, genders, languages, and more. Finally coupled with object detectors and repurposed machine learning.