Google’s ‘Astra’ Marks the Beginning of Autonomous AI Agents

OpenAI tried to steal the 'autonomous agents' thunder with GPT-4o. Now, Google I/O 2024 has answered (or maybe not!).

Share

Published on May 15, 2024

by Sukriti Gupta

Listen to this story

A new era of autonomous AI agents has begun. At Google I/O 2024, the tech giant unveiled Project Astra, a first-of-its-kind initiative to develop universal AI agents capable of perceiving, reasoning, and conversing in real-time.

“Building on Gemini, we’ve developed prototype agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall,” said Google DeepMind chief Demis Hassabis, in a blog post.

Hassabis added that with the release it would be easy to see a future where people could have an expert AI assistant by their side via phone or glasses.

https://twitter.com/GoogleDeepMind/status/1790433540548558853

The release comes just a day after OpenAI unveiled GPT-4o, which won hearts online with its ‘omni’ capabilities across text, vision, and audio. OpenAI’s demos, which included a real-time translator, coding assistant, AI tutor, friendly companion, poet, and singer, soon became the talk of the town.

However, its agentic capabilities in particular have caught everyone’s attention, with some even calling it ‘the biggest part of the update’ and ‘a step closer to autonomous agents’.

The GPT-4o desktop app can read your screen in real-time and interact with your OS, revolutionising the way people work. The app allows for voice conversations, screenshot discussions, and instant access to ChatGPT. It’s like having an AI teammate on your device who can help you with whatever you’re working on.

The ChatGPT desktop app just became the best coding assistant on the planet.

Simply select the code, and GPT-4o will take care of it.

Combine this with audio/video capability, and you get your own engineer teammate. pic.twitter.com/g4fWcbhXy2
— Pietro Schirano (@skirano) May 13, 2024

Source: X

OpenAI president and co-founder Greg Brockman also demonstrated human-computer interactions (and even human-computer-computer interactions), giving users a glimpse of pre-AGI vibes.

Introducing GPT-4o, our new model which can reason across text, audio, and video in real time.

It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction): pic.twitter.com/VLG7TJ1JQx
— Greg Brockman (@gdb) May 13, 2024

You can get different instances of GPT-4o to interact with each other. The model can be interrupted in real-time, change its emotion, and even adjust its response with little to no latency. All this is a big breakthrough for building AI agents.

Real-time conversation with a voice agent that can understand the emotion in a person’s voice and that someone can interrupt with no lag, makes GPT-4o extremely helpful for building voice and vision-enabled smart agents.

A promising application is customer service, including a new type of technical support where customers can walk through their problems via video stream, allowing the agent to troubleshoot it in real time with the customer.

This was a fun one! Take a look at 2 AI agents resolving a customer service claim with #OpenAI new #GPT4o.

Working with customers to build transformational solutions always gets me fired up. The potential solutions we can build with this new SOTA model has my head spinning! pic.twitter.com/86SNgNI6Tl
— Joe Beutler (@JoeBeutler) May 14, 2024

These developments show that with GPT-4o, the future is poised to be agent-to-agent. However, with their latest release, it’s clear that Google has also gone all in on AI agents, deploying them across the company’s product ecosystem.

From an agent who can continuously organise all receipts in your inbox into a spreadsheet to an agent who can return your orders, Google has it all. Use cases for the assistant also include aiding in multi-step researching, and reasoning, to even shopping to prepare a meal plan. Need to do something as tedious as updating your email? It has you covered there too, with a browser agent that works across multiple external websites to do tasks like updating addresses across dozens of websites.

Google also introduced an AI Teammate who lives inside Google Workspace to do collaborative tasks.

TLDR: Google is ALL IN on AI agents

AI agents are deployed across their whole product ecosystem.

8 wild demos from Google I/O today:

1. An email agent to continuously organise all receipts in your inbox into a spreadsheet pic.twitter.com/A4ij23uOV1
— Chief AI Officer (@chiefaioffice) May 14, 2024

Despite all this, Google’s Project Astra and AI agent developments have received mixed responses online.

On the one hand, people appreciate Astra’s long context support, memory ‘to remember where the glasses were’, and native video processing capabilities compared to GPT-4o, which some contest only processes a single frame at a time.

Source: X

Many are also saying that with these advanced AI email, browser, and search agent demos, as well as an AI Teammate, Google will likely obliterate many startups focusing on email & browser-based agents.

“One thing Google is doing right: they are finally making serious efforts to integrate AI into the search box. I sense the agent flow: planning, real-time browsing, and multimodal input, all from the landing page. Google’s strongest moat is distribution. Gemini doesn’t have to be the best model to be the most used one in the world,” wrote a user on X.

Not just perplexity but every other vertically focused tool that google touches

Google meet + Gemini – competes against zoom, even gong

Gmail + Gemini – competes against any other AI email assistant

Directly competing against any one of googles offerings will be challenging
— Jerry Liu (@jerryjliu0) May 14, 2024

On the other hand, some are not impressed with Google Astra’s slightly longer latency and are even sceptical on whether the real product will match the ‘too good to be true’ promises made in the demo.

“Remember the last time Google demo’d (sic) their AI it was all a complete lie,” wrote a user online, “Google promised a lot in events but never released like OpenAI does,” added another. Many even went so far as to call the demo an advertisement, rather than an actual demo.

Compared to OpenAI’s live demo, Google making use of a pre-recorded demo has some taking things with a pinch of salt. At least until they get to test the product.

Source: X

https://twitter.com/bindureddy/status/1790435529194107330

Everybody Is Bullish on ‘AI Agents’

Regardless of who wins this race, the important thing is that everybody seems to be bullish on AI agents, and soon, we might see a lot more interesting developments take shape.

Source: X

Recently, venture capitalist Vinod Khosla envisioned a future where internet interactions will be done mostly through agents. He predicted a future in which most consumer access to the internet will be agents acting for consumers doing tasks and fending off marketers and bots. “Tens of billions of agents on the internet will be normal,” he wrote.

Similarly, Meta CEO Mark Zuckerberg highlighted the evolving role of AI agents in customer interactions, envisioning a future where businesses and creators each have their own AI to represent their interests.

“A lot of people talk about the ‘ChatGPT moment’, where you’re like ‘Wow, never seen anything like this’. Many people will have kind of a ‘Wow, I couldn’t imagine an AI agent doing this’ moment,” said DeepLearning.AI founder Andrew Ng at Sequoia Capital’s AI Ascent.

Looks like it is finally happening.

📣 Want to advertise in AIM? Book here

Sukriti Gupta

Having done her undergrad in engineering and masters in journalism, Sukriti likes combining her technical know-how and storytelling to simplify seemingly complicated tech topics in a way everyone can understand