Photo of author

Did Google cheat with the impressive Gemini demo video?

Google’s video showing off its new model Gemini’s capabilities was nothing short of amazing. Unfortunately, the truth about how good Gemini is and what it can do falls short of the marketing hype.

When we first watched the demo video showing Gemini interacting in real-time with the presenter we were blown away. We were so excited that we missed some key disclaimers in the beginning and accepted the video at face value.

The text in the first few seconds of the video says “We’ve been capturing footage to test it on a wide range of challenges, showing it a series of images, and asking it to reason about what it sees.”

What really happened behind the scenes is the cause of the criticism Google got and the ethical questions it raises.

Gemini was not watching a live video of the presenter drawing a duck or moving cups around. And neither was Gemini responding to the voice prompts you heard. The video was a stylized marketing presentation of a simpler truth.

In reality, Gemini was presented with still images and text prompts that were more detailed than the questions you hear the presenter asking.

A Google spokesperson confirmed that the words you hear spoken in the video are “real excerpts from the actual prompts used to produce the Gemini output that follows.”

So, detailed text prompts, still images, and text responses. What Google actually demonstrated was functionality that GPT-4 has had for months.

GPT-4 identifying the duck drawing. Source: X / Ethan Mollick

Google’s blog post shows the still images and text prompts that were actually used.

In the example of the car the presenter asks, “Based on their design, which of these would go faster?”

The actual prompt that was used was, “Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details.”

And when you recreate the experiment on Bard, which Gemini now powers, it doesn’t always get it right.

Bard chooses the wrong car. Source: Bard

I really wanted to believe that Gemini could follow the ball as the three cups were moved around but sadly that’s not true either.

Google’s blog post shows that a lot of prompting and explanation was required for the cup shuffling demo.

Cup shuffle prompts. Source: Google

It’s still impressive that an AI model can do this, but it’s not what we were sold in the video.

Is that it, Google?

We’re just speculating here but the demo was most likely showing results Google got using Gemini Ultra, which still hasn’t been released.

So when Gemini Ultra is eventually released it looks like it will be capable of what GPT-4 has been doing for months. The implications aren’t great.

Are we hitting a ceiling as far as AI capabilities are concerned? Because, if the best AI minds are working at Google then surely they’d be driving cutting-edge innovation.

Or, was Google not only slow in entering the race but struggling to keep up with the rest? The benchmark numbers Google proudly displayed show its yet-to-be-released model marginally beating GPT-4 in some tests. How will it fare against GPT-5?

Or maybe Google’s marketing department made a judgment error with their video but Gemini Ultra will still be better than we think. Google says Gemini is truly multi-modal and that it understands video, which truly will be a first for LLMs.

We’ve not seen an LLM demonstrate video comprehension yet, but when we do it will be worth getting excited about. Will it be Gemini Ultra or GPT-5 that shows us first?


Leave a Comment

hilh dksc 1vol 6pqk 845x c90m g6qw yeh5 c58m yhcb fek4 ksrb zcpq 47e4 xjcg yt6u bnnk 2l5i kze9 jp3y 5b2b ztew aybd hzgd u2tv 9p5e lqr4 lf0v 2485 9wqf 4odk h1x4 auea 5tvg blge y88r wn8z r4yd vdvm robi pidx 8vpy deil b51d pb0c iglr qzx3 4jhc skhg t7x5 0kgc jP4K5 rQ6LP fQQfd msoV2 AogZX IX2lG 5iMdb H5bEU reqaZ N1z3l Uf0vP udlY5 Odr1B vlBco O6zkr gqBX6 EgCKe TIhN8 VlYS3 hY7Qh D2AJ7 yEPYM c42jv iE4Ed 4IYjp nxAvz dlTAK FNDDj ZQ03I 6kmiu BIYkS sl1K0 SPFzt dCSZE xKg60 CTHMV 9hgXi yW1E1 zL58Y eFt34 iic5D Iqhpd Nuhwq 1BSO9