UHG
Search
Close this search box.

Google’s ‘Astra’ Marks the Beginning of Autonomous AI Agents

OpenAI tried to steal the 'autonomous agents' thunder with GPT-4o. Now, Google I/O 2024 has answered (or maybe not!).

Share

Listen to this story

A new era of autonomous AI agents has begun. At Google I/O 2024, the tech giant unveiled Project Astra, a first-of-its-kind initiative to develop universal AI agents capable of perceiving, reasoning, and conversing in real-time.

“Building on Gemini, we’ve developed prototype agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall,” said Google DeepMind chief Demis Hassabis, in a blog post

Hassabis added that with the release it would be easy to see a future where people could have an expert AI assistant by their side via phone or glasses. 

https://twitter.com/GoogleDeepMind/status/1790433540548558853

The release comes just a day after OpenAI unveiled GPT-4o, which won hearts online with its ‘omni’ capabilities across text, vision, and audio. OpenAI’s demos, which included a real-time translator, coding assistant, AI tutor, friendly companion, poet, and singer, soon became the talk of the town.

However, its agentic capabilities in particular have caught everyone’s attention, with some even calling it ‘the biggest part of the update’ and ‘a step closer to autonomous agents’. 

The GPT-4o desktop app can read your screen in real-time and interact with your OS, revolutionising the way people work. The app allows for voice conversations, screenshot discussions, and instant access to ChatGPT. It’s like having an AI teammate on your device who can help you with whatever you’re working on.

Source: X

OpenAI president and co-founder Greg Brockman also demonstrated human-computer interactions (and even human-computer-computer interactions), giving users a glimpse of pre-AGI vibes. 

You can get different instances of GPT-4o to interact with each other. The model can be interrupted in real-time, change its emotion, and even adjust its response with little to no latency. All this is a big breakthrough for building AI agents. 

Real-time conversation with a voice agent that can understand the emotion in a person’s voice and that someone can interrupt with no lag, makes GPT-4o extremely helpful for building voice and vision-enabled smart agents.  

A promising application is customer service, including a new type of technical support where customers can walk through their problems via video stream, allowing the agent to troubleshoot it in real time with the customer. 

These developments show that with GPT-4o, the future is poised to be agent-to-agent. However, with their latest release, it’s clear that Google has also gone all in on AI agents, deploying them across the company’s product ecosystem.

From an agent who can continuously organise all receipts in your inbox into a spreadsheet to an agent who can return your orders, Google has it all. Use cases for the assistant also include aiding in multi-step researching, and reasoning, to even shopping to prepare a meal plan. Need to do something as tedious as updating your email? It has you covered there too, with a browser agent that works across multiple external websites to do tasks like updating addresses across dozens of websites.

Google also introduced an AI Teammate who lives inside Google Workspace to do collaborative tasks.

Despite all this, Google’s Project Astra and AI agent developments have received mixed responses online. 

On the one hand, people appreciate Astra’s long context support, memory ‘to remember where the glasses were’, and native video processing capabilities compared to GPT-4o, which some contest only processes a single frame at a time.

Source: X

Many are also saying that with these advanced AI email, browser, and search agent demos, as well as an AI Teammate, Google will likely obliterate many startups focusing on email & browser-based agents. 

“One thing Google is doing right: they are finally making serious efforts to integrate AI into the search box. I sense the agent flow: planning, real-time browsing, and multimodal input, all from the landing page. Google’s strongest moat is distribution. Gemini doesn’t have to be the best model to be the most used one in the world,” wrote a user on X.

On the other hand, some are not impressed with Google Astra’s slightly longer latency and are even sceptical on whether the real product will match the ‘too good to be true’ promises made in the demo. 

“Remember the last time Google demo’d (sic) their AI it was all a complete lie,” wrote a user online, “Google promised a lot in events but never released like OpenAI does,” added another. Many even went so far as to call the demo an advertisement, rather than an actual demo.

Compared to OpenAI’s live demo, Google making use of a pre-recorded demo has some taking things with a pinch of salt. At least until they get to test the product.

Source: X

https://twitter.com/bindureddy/status/1790435529194107330

Everybody Is Bullish on ‘AI Agents’

Regardless of who wins this race, the important thing is that everybody seems to be bullish on AI agents, and soon, we might see a lot more interesting developments take shape.

Source: X

Recently, venture capitalist Vinod Khosla envisioned a future where internet interactions will be done mostly through agents. He predicted a future in which most consumer access to the internet will be agents acting for consumers doing tasks and fending off marketers and bots. “Tens of billions of agents on the internet will be normal,” he wrote.

Similarly, Meta CEO Mark Zuckerberg highlighted the evolving role of AI agents in customer interactions, envisioning a future where businesses and creators each have their own AI to represent their interests. 

“A lot of people talk about the ‘ChatGPT moment’, where you’re like ‘Wow, never seen anything like this’. Many people will have kind of a ‘Wow, I couldn’t imagine an AI agent doing this’ moment,” said DeepLearning.AI founder Andrew Ng at Sequoia Capital’s AI Ascent

Looks like it is finally happening.

📣 Want to advertise in AIM? Book here

Picture of Sukriti Gupta

Sukriti Gupta

Having done her undergrad in engineering and masters in journalism, Sukriti likes combining her technical know-how and storytelling to simplify seemingly complicated tech topics in a way everyone can understand
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.