Google's PaliGemma 2: Advancing Emotion Recognition Amidst Regulatory Concerns

Emotion and facial recognition are sensitive areas, and in the EU, both are largely prohibited under the AI Act, with some exceptions. Google’s new and publicly accessible Vision-Language Model, PaliGemma 2, can now learn the ability to recognize emotions. This model can fundamentally recognize emotions, and with fine-tuning, it’s apparently easy to enhance this capability. The AI is provided with images along with corresponding emotional information. This capability is likely possible with other open AI models as well.

PaliGemma 2 has just been introduced in its second version. It can process both text and images, enabling users to ask questions about an image. According to Google’s blog post, the AI model can “extract detailed and contextually relevant information” from an image, going beyond mere object recognition. PaliGemma 2 can describe “actions, emotions, and the narrative of a scene.”

There is concern about the use of emotion recognition because PaliGemma 2 is an open model, making it easily accessible. In the European AI regulation, its use is largely banned. Employers, schools, or private individuals cannot use it, but border protection agencies and airplane pilots can use a system to monitor if pilots are tired or alert.

Facial and emotion recognition is not as straightforward as it might seem. While a smile is easy to recognize, understanding the context of the smile is crucial for correct interpretation, leading to many errors. Facial recognition software is also suspected of having a strong bias, often incorrectly associating negative traits and emotions with people of darker skin tones.

Besides visual emotion recognition, it’s also long been possible to discern emotions from a voice using artificial intelligence. The voice can reveal a lot about emotions and even illnesses. These systems are not freely allowed either. They are used in call centers, but there must be transparency about their use. According to the AI Act, they are subject to certain transparency and documentation requirements.

It’s questionable to what extent the ability to recognize emotions in a model like PaliGemma 2 will lead to stricter scrutiny and categorization under the regulation. So far, Vision-Language Models, as General Purpose AI, are subject to relatively lenient special regulations, which state that reclassification can occur if significant risks are identified.