One of the internal slides showed a conversation example where someone asked, “Do I have a pet?” The AI’s response explained that the person once owned a dog. However, it then went on to describe additional details such as the pet’s name, the dog wearing a red coat, and even the names of the family members frequently captured with the animal. As implausible as that might sound, Google’s own demo of Gemini’s real-time image comprehension skills suggests that Project Ellmann is not too far into the future.
Seeing some qs on what Gemini *is* (beyond the zodiac . Best way to understand Gemini’s underlying amazing capabilities is to see them in action, take a look ⬇️ pic.twitter.com/OiCZSsOnCc
— Sundar Pichai (@sundarpichai) December 6, 2023
The AI is capable of telling what food items a person prefers, the products they are looking to purchase, potential travel plans based on information gleaned from screenshots saved in the gallery, and more. And based on the web search history, it will even vomit out details such as a person’s favorite sites and apps. Google Photos already does that to a certain extent. If I enter my bank’s name in the app’s search field, it automatically pulls up images of my credit card and banking screenshots saved on the phone.
Moreover, the foundation for implementing Project Ellmann locally on a smartphone has already been laid. The Gemini Nano model is already powering a couple of features on the Google Pixel 8 Pro, thanks to the dedicated AI engine on its Tensor G3 chip. It’s a breakthrough achievement because on-device AI processing means you don’t need to connect with Google’s servers over an internet lane. Plus, it significantly speeds up the task at hand.