Google's Gemini AI Enhances Interaction with Videos, Images, and PDFs

Google has introduced new features for its Gemini AI, which allows users to interact with YouTube videos, images, and PDF files in natural language. This feature was announced during the Galaxy S25 event, and while the new Galaxy models are still awaited, owners of Galaxy S24, Pixel 9, and some other models can already access it.

With Gemini Live, you can now have conversations about various types of content. For instance, when watching a YouTube video, you can ask questions or discuss the video. Similarly, you can interact with PDF files and images. To use this feature, activate the Gemini overlay on your device by pressing the power button or swiping from a corner of the screen while viewing the content.

Once activated, the Gemini overlay offers options to talk about the video or ask questions. For PDFs and images, you may need to select the file first. A button labeled “Talk about it with Gemini Live” will appear, allowing you to engage in a conversation about the content.

This feature is particularly useful as it can summarize YouTube videos and answer specific questions about them. For example, in a video by MKBHD discussing Samsung’s “Project Moohan,” an XR headset with Android XR, Gemini explains what the YouTuber likes about the product and what needs improvement. You can ask detailed questions about the hardware or software, but only if the YouTuber has mentioned them in the video.

Gemini can also summarize PDF files, answer questions, and even create quizzes to test your knowledge. However, be cautious with large PDF files, as processing them can take a while.

Currently, you cannot have full conversations about web articles with Gemini Live, but you can ask questions about the content displayed on your screen. Gemini will provide a brief summary of the article’s content. Like all AI models, Gemini Live can sometimes produce incorrect information. In a test, it incorrectly identified an image of elephants, illustrating that the AI can make mistakes.

To maintain control over your data, you can choose whether to enable or disable automatic transmission of display actions to Gemini. By pressing and holding any Gemini prompt, you can toggle this setting. When disabled, the “Talk Live” button will only appear after you manually submit the content.

When enabled, all screen actions are automatically sent to Gemini when you tap on them. To reactivate automatic transmission, press and hold the “Ask about…” chip and select “Enable automatic transmission.”

These new features are part of Gemini’s preparation for Project Astra, which Google first introduced during the I/O 2024 event and announced as part of Gemini 2.0 in December 2024. In the coming months, more components of Project Astra, such as screen sharing and live video streaming in the Gemini app, will become available. Long-term plans include integrating these features into XR headsets and AR glasses. Google and Samsung have both announced similar plans, and other hardware partners like Sony, Lynx, and Xreal are also involved.

Google’s Gemini AI Enhances Interaction with Videos, Images, and PDFs

Related Posts