Google's new AI can find lost specs

A screenshot of the Google I/O presentation in which an AI running on a phone is able to identify the view it was looking at - in this case Kings Cross in London — Image caption,
A demo of Project Astra a prototype Google AI assistant

Chris Vallance

Technology reporter

14 May 2024

AI systems able to interpret information in images via a phone camera, in videos and sounds and in spoken language have been shown off by Google.

In one demo a prototype AI powered assistant running on a phone was able to answer the age-old question "where did I put my glasses".

It comes a day after rival OpenAI's launch of its latest AI system, GPT-4o, which included an eye-catching presentation in which it read human expressions via a phone camera, and chatted - and flirted - fluently.

Google appears keen to stress its tools are as capable of this kind of so-called "multimodal" understanding as its rival.

As a sign of this "anything you can do I can do better" style competition, Google had teased the capability of its systems running on a phone just ahead of OpenAI's announcement.

This X post cannot be displayed in your browser. Please enable Javascript or try a different browser.View original content on X

The BBC is not responsible for the content of external sites.

Skip X post by Google

Allow X content?

This article contains content provided by X. We ask for your permission before anything is loaded, as they may be using cookies and other technologies. You may want to read X’s cookie policy, external and privacy policy, external before accepting. To view this content choose ‘accept and continue’.

The BBC is not responsible for the content of external sites.

Scam spotter

The firm showcased multimodal features in Gemini Nano, an AI assistant that runs "on device" on its Pixel phone, and in the Gemini App.

It also demonstrated a prototype scam alert feature it was testing for Gemini Nano that could listen to a phone call and warn that it was a scam, without any information about the call leaving the phone.

The new AI powered demos were revealed at Google I/O, the firm's annual presentation for software developers.

A quick AI-powered transcription of proceedings, by BBC News, suggested that the word "multimodal" came up at least 22 times.

Speakers such as Sir Demis Hassabis, the head of Google Deepmind, repeatedly stressed the firm's long-running interest in multimodal AI and emphasised that its models were "natively" able to handle images, video and sounds and draw connections between them.

He showcased project Astra which is exploring the future of AI assistants. In a demo video of its capabilities, it was able to answer spoken questions about what it was seeing through a phone camera. At the end of the demo a Google employee asked the virtual assistant where they had left their specs, to which it replied that it had just seen them on a nearby desk.

There was also a "live" demo of using video when searching Google. Google Search was able to suggest ways to fix a broken record player, having been shown it malfunctioning.

Also in the announcement:

AI-generated overviews - text that answers search questions before the listed results - will be rolled out across the US and will be brought to more countries soon. These are currently being tested in the UK.
AI-powered search for Google Photos to make it easier search your collection of snaps.
New image, video, and music generating AI systems - to be released as a preview to selected musicians, artists and film-makers

New AI features such as summarising all the emails on a certain topic will come to Google stalwarts such as Gmail.

And looking much further into future, there was also a demo of a prototype system that would create a virtual "team-mate" who would could be told to perform certain tasks such as attending multiple online meetings at once.

Google's new AI can find lost specs

Scam spotter

More on this story

Best of the BBC