Glasgow’s Wonka Fiasco is a Scary Case of AI Abuse
After a year of living trailing behind OpenAI in the AI arms race, Google is finally heating up the competition with the launch of Gemini – an AI model it says outperforms ChatGPT.
The AI model, which CEO Sundar Pichai says represents “the beginning of a new era of AI,” is Google’s newest and most capable large language model (LLM) yet.
Pichai claims Gemini has advanced "reasoning capabilities" to "think more carefully" when answering difficult questions – reducing the risk of “hallucinations” that other AI models, including Google’s own, have struggled with.
The model comes in three versions and is “multimodal”, which means it can comprehend text, audio, images, video and computer code simultaneously.
It will be integrated into Google products including its search engine, and is being released initially in more than 170 countries including the US on Wednesday as an upgrade to Bard. But the upgrade will not be released in the UK and Europe as Google seeks clearance from regulators.
In a statement, Demis Hassabis, the chief executive of DeepMind, the London-based Google unit that developed Gemini, said: “We’ve done a very thorough analysis of the systems side by side, and the benchmarking. It’s been the most complicated project we’ve ever worked on, I would say the biggest undertaking. It’s been an enormous effort.”
Google Gemini's benchmark numbers absolutely CRUSH GPT-4!!!!
We have a war on our hands. pic.twitter.com/AJ1meXVqSq
— Deedy (@debarghya_das) December 6, 2023
Two smaller versions of Gemini, Pro and Nano, will be released this Wednesday. The Pro model can be accessed on Google’s Bard chatbot and the Nano version will be on mobile phones using Google’s Android system.
The most powerful iteration, Ultra, is being tested externally and will not be released publicly until early 2024, when it will also be integrated into a version of Bard called Bard Advanced.
Hassabis said the Ultra model would undergo external “red team” testing – where experts test the security and safety of a product – and Google would share the results with the US government, in line with an executive order issued by Joe Biden in October.
Gemini’s most basic models are currently text in and text out. But Hassabis said more powerful models, including Ultra, would be able to work with images, video, and audio.
“It’s going to get even more general than that,” he said. There are still things like action and touch – more like robotics-type things. These models just sort of understand better about the world around them.
The launch of Gemini AI
But today, Google released several videos to demonstrate Gemini’s capabilities. These included one video showing the Ultra model understanding a student’s handwritten physics homework answers and giving complex tips on how to solve the questions, including showing equations.
Another showed Gemini’s Pro version analysing and identifying a drawing of a duck as well as answering correctly which film a person was enacting in a smartphone video.
In one of these videos, Mr Collins said Gemini’s most powerful mode had shown “advanced reasoning” and could show “novel capabilities” – an ability to perform tasks that have not been shown in other AI models, including ChatGPT.
Gemini AI vs ChatGPT: Which is Better?
Google has so far struggled to attract as much attention as OpenAI's explosive chatbot ChatGPT. But it claims Gemini AI Ultra performs better than ChatGPT on 30 of the 32 academic benchmarks in reasoning and understanding it tested on.
Google also said Gemini Ultra was the first AI model to outperform human experts on these benchmark tests. It scored 90%, on a multitasking test called MMLU, which covers 57 subjects including maths, physics, law, medicine and ethics, beating all other current AI models, including OpenAI's GPT-4.
The less powerful Gemini Pro model also outperformed GPT-3.5, the LLM behind the free-to-access version of ChatGPT, on six out of eight tests.
Still, Google warned that “hallucinations” were still a problem with every version of the model. “It’s still, I would say, an unresolved research problem,” said Eli Collins, the head of product at Google DeepMind
Here's a comparison between Gemini Ultra and GPT-4, which are the most superior versions of Google's Gemini and OpenAI's ChatGPT, using the benchmarks tested by Google:
Comparing Gemini AI and ChatGPT Benchmarks
According to Google, Gemini AI beat ChatGPT in almost all of its academic benchmarks. Here's a comparison between Gemini Ultra and ChatGPT-4 using the benchmarks tested by Google:
1. General Understanding:
- Gemini Ultra scored an incredible 90.0% in Massive Multitask Language Understanding (MMLU), demonstrating its ability to comprehend 57 subjects, including STEM, humanities, and more.
- GPT-4 Achieved just an 86.4% 5-shot capability in a similar benchmark.
2. Reasoning Abilities:
- Gemini Ultra scored 83.6% in the Big-Bench Hard benchmark, demonstrating proficiency in a wide range of multi-step reasoning tasks.
GPT-4 showed similar performance with an 83.1% 3-shot capability in a similar context.
3. Reading Comprehension:
- Gemini Ultra scored an 82.4 F1 Score in the DROP reading comprehension benchmark.
- GPT-4V achieved a slightly less impressive score with 80.9 3-shot capability in a similar scenario.
4. Commonsense Reasoning:
- Gemini Ultra scored with an 87.8% 10-shot capability in the HellaSwag benchmark.
- GPT-4 showed a slightly higher 95.3% 10-shot capability in the same benchmark.
5. Mathematical Proficiency:
- Gemini Ultra excelled in basic arithmetic manipulations with a 94.4% maths score.
- GPT-4 maintained 92.0% 5-shot capability in Grade School math problems.
6. Math Problems:
- Gemini Ultra could tackle complex math problems with a 53.2% 4-shot capability.
- GPT-4 scored slightly less, with a 52.9% 4-shot capability in a similar context.
7. Code Generation:
- Gemini Ultra could generate Python code with a commendable 74.4% capability.
- GPT-4 did not perform as well, scoring a 67.0% capability in a similar benchmark.
8. Natural Language to Code (Natural2Code):
- Gemini Ultra showed proficiency in generating Python code from text with a 74.9% 0-shot capability.
- GPT-4 scored well too, maintaining a 73.9% 0-shot capability in a similar benchmark.
What makes Gemini AI different from ChatGPT?
Like other AI models, including ChatGPT, Hassabis said data used to train Gemini had been taken from a range of sources including the open web. However, there are some key differences between the two models that make Gemini AI a more versatile and powerful tool if what has been announced today is to be believed.
For instance, The GPT-3.5 model used on the free version of ChatGPT was trained using data up to September 2022, meaning that it can only provide accurate information up to that point. The same is technically true of GPT-4, but it’s better than GPT -3.5 at learning and responding to current information provided through ChatGPT prompts.
Gemini AI, however, is trained on real-time data from the internet, meaning it can answer questions using up-to-date information. The model is trained on a massive dataset of text and code, making it larger and more powerful than ChatGPT.
This means that it can generate more complex and nuanced text, and it can also perform more demanding tasks, such as translation and summarization.