Google I/O 2024 – Here are the AI highlights Google revealed

May 15, 2024

  • The Google I/O 2024 event saw new Google AI product releases and prototypes announced
  • Gemini Pro 1.5 will get a 2M context upgrade and be integrated into Google Workspaces
  • Several tools get multimodal capabilities and new image, music, and video generators were showcased

Google’s I/O 2024 event kicked off on Tuesday with multiple new AI product advancements announced.

OpenAI may have tried to upstage Google with the release of GPT-4o on Monday, but the Google I/O 2024 keynote was full of exciting announcements.

Here’s a look at the standout AI advancements, new tools, and prototypes Google is experimenting with.

Ask Photos

Google Photos, Google’s photo storage and sharing service, will be searchable using natural language queries with Ask Photos. Users can already search for specific items or people in their photos but Ask Photos takes this to the next level.

Google CEO Sundar Pichai showed how you could use Ask Photos to remind you what your car’s license plate number was or provide feedback on how a child’s swimming capabilities had progressed.

Powered by Gemini, Ask Photos understands context across images and can extract text, create highlight compilations, or answer queries about stored images.

With more than 6 billion images uploaded to Google Photos daily, Ask Photos will need a huge context window to be useful.

Gemini 1.5 Pro

Pichai announced that Gemini 1.5 Pro with a 1M token context window will be available to Gemini Advanced users. This equates to around 1,500 pages of text, hours of audio, and a full hour of video.

Developers can sign up for a waitlist to try Gemini 1.5 Pro with an impressive 2M context window which will soon be generally available. Pichai says this is the next step in Google’s journey toward the ultimate goal of infinite context.

Gemini 1.5 Pro has also had a performance boost in translation, reasoning, and coding and will be truly multimodal with the ability to analyze uploaded video and audio.

Google Workspace

The expanded context and multimodal capabilities enable Gemini to be extremely useful when integrated with Google Workspace.

Users can use natural language queries to ask Gemini questions related to their emails. The demo gave an example of a parent asking for a summary of recent emails from their child’s school.

Gemini will also be able to extract highlights from and answer questions about Google Meet meetings of up to an hour.

NotebookLM – Audio Overview

Google released NotebookLM last year. It allows users to upload their own notes and documents which NotebookLM becomes an expert on.

This is extremely useful as a research guide or tutor and Google demonstrated an experimental upgrade called Audio Overview.

Audio Overview uses the input source documents and generates an audio discussion based on the content. Users can join the conversation and use speech to query NotebookLM and steer the discussion.

There’s no word on when Audio Overview will be rolled out but it could be a huge help for anyone wanting a tutor or sounding board to work through a problem.

Google also announced LearnLM, a new family of models based on Gemini and fine-tuned for learning and education. LearnLM will power NotebookLM, YouTube, Search, and other educational tools to be more interactive.

The demo was very impressive but already it seems like some of the mistakes Google made with its original Gemini release videos crept into this event.

AI agents and Project Astra

Pichai says that AI agents powered by Gemini will soon be able to handle our mundane day-to-day tasks. Google is prototyping agents that will be able to work across platforms and browsers.

The example Pichai gave was of a user instructing Gemini to return a pair of shoes and then having the agent work through multiple emails to find the relevant details, log the return with the online store, and book the collection with a courier.

Demis Hassabis introduced Project Astra, Google’s prototype conversational AI assistant. The demo of its multimodal capabilities gave a glimpse of the future where an AI answers questions in real-time based on live video and remembers details from earlier video.

Hassabis said some of these features would roll out later this year.

Generative AI

Google gave us a peek at the image, music, and video generative AI tools it’s been working on.

Google introduced Imagen 3, its most advanced image generator. It reportedly responds more accurately to details in nuanced prompts and delivers more photorealistic images.

Hassabis said Imagen 3 is Google’s “best model yet for rendering text, which has been a challenge for image generation models.”

Music AI Sandbox is an AI music generator designed to be a professional collaborative music creation tool, rather than a full track generator. This looks like a great example of how AI could be used to make good music with a human driving the creative process.

Veo is Google’s video generator that turns text, image, or video prompts into minute-long clips at 1080p. It also allows for text prompts to make video edits. Will Veo be as good as Sora?

Google will roll out its SynthID digital watermarking to text, audio, images, and video.



All these new multimodal capabilities need a lot of processing power to train the models. Pichai unveiled Trillium, the 6th iteration of its Tensor Processing Units (TPUs). Trillium delivers more than 4 times the compute of the previous TPU generation.

Trillium will be available to Google’s cloud computing customers later this year and will make NVIDIA’s Blackwell GPUs available in early 2025.

AI Search

Google will integrate Gemini into its search platform as it moves toward using generative AI in answering queries.

With AI Overview a search query results in a comprehensive answer collated from multiple online sources. This turns Google Search into more of a research assistant than simply finding a website that may contain the answer.

Gemini enables Google Search to use multistep reasoning to break down complex multipart questions and return the most relevant information from multiple sources.

Gemini’s video understanding will soon allow users to use a video to query Google Search.

This will be great for users of Google Search, but it’ll likely result in a lot less traffic for the sites from which Google gets the info.

Gemini 1.5 Flash

Google announced a lightweight, cheaper, fast model called Gemini 1.5 Flash. Google says the model is “optimized for narrower or high-frequency tasks where the speed of the model’s response time matters the most.”

Gemini 1.5 Flash will cost $0.35 per million tokens, a lot less than the $7 you’d have to pay to use Gemini 1.5 Pro.

Each of these advancements and new products deserves a post of its own. We’ll post updates as more information becomes available or when we get to try them out ourselves.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions