Last updated August 20, 2024
In AI Breakthroughs

Google’s Imagen 3 Can Only Dream of Achieving What Grok 2 Just Did

xAI’s latest Grok feature and Google’s Imagen 3 is exactly as chaotic as you might expect.

Share

Illustration by Nikhil Kumar

Published on August 20, 2024

by Tarunya S

Imagen 3 and Grok 2 (with FLUX.1 integration) are both advanced AI text-to-image generators, yet they differ significantly in their approaches and applications, with the latter having little to none guardrails or safeguards.

The outcome: This feature from Grok 2 allows users to create uncensored deepfakes, like vice president Kamala Harris and Donald Trump posing as a couple, sparking deep concerns.

Google, on the other hand, can only dream of doing such things with Imagen 3. Gmail creator Paul Buchheit, in a recent interview, said AI has the potential to provoke regulators, which is a significant concern for Google, given the tech giant has already been slapped with billions of dollars in lawsuits.

“They had a version of Dolly called Image Gen, and it was prohibited from making human form,” said Buchheit discussing Google’s Imagen, and how the company has been struggling to dominate the AI landscape, given they have all the necessary resources and early start in AI.

Cautious or Controversial?

In the case of Imagen 3, there are some restrictions in place as the tool refuses to generate images of public figures or weapons. While it won’t generate named copyrighted characters, you can bypass this by describing the character you want to create.

Earlier this year, Alphabet lost $90 billion in market value amid controversy over its generative AI product. Users reported it was producing racially inaccurate images and claimed that the chatbot refused to identify negative historical figures.

With Imagen 3, the tech giant does not want a repeat of the fiasco. However, Reddit users have criticised Imagen 3 for being too restrictive in what images it is allowed to generate.

The model is trained on a large dataset comprising images, text and associated annotations. To ensure quality and safety standards, the company employs a multi-stage filtering process. This process begins by removing unsafe, violent, or low-quality images. Then it eliminates AI-generated images to prevent the model from learning artefacts or biases commonly found in such images.

The team employs deduplication pipelines and strategically down-weights similar images to minimise the risk of outputs overfitting specific elements of the training data.

Each image in the dataset is paired with original and synthetic captions generated by multiple Gemini models using diverse prompts. Filters are applied to ensure safety and remove personally identifiable information.

Grok, on the other hand, developed by Musk’s xAI, offers a more open-ended approach.

Grok, like most LLMs today, was pre-trained on a diverse range of text data from publicly available sources on the internet up to Q3 2023, as well as datasets curated by ‘AI tutors’, who are human reviewers. Importantly, Grok was not pre-trained on data from X, including public posts. Its training journey is outlined on the xAI website and model card.

When responding to user queries, Grok has a unique feature that enables it to decide whether to search X public posts or conduct a real-time web search on the Internet. This capability allows Grok to provide up-to-date information and insights on a wide range of topics by accessing real-time public X posts.

Generate an image of @elonmusk planting a US flag on Mars.

Grok 2.0: pic.twitter.com/WGcf2KsQns
— Darren Marble (@darrenmarble) August 14, 2024

This flexibility makes the model more versatile, but also more controversial, as it can create a wider range of content, including potentially misleading or inappropriate images.

xAI’s Grok chatbot now lets you create images from text prompts and publish them to X and, so far, the rollout seems as chaotic as everything else.

It has been used to generate all sorts of wild content, including images with drugs, violence, and public figures doing questionable things.

Other experiments conducted by users on X show that even if Grok does refuse to generate something, loopholes are easy to find. That leaves very few safeguards against spitting out gory images if given the proper prompts, according to X user Christian Montessori.

And while Musk is aware of these issues, he seems to find them amusing, saying the tool is allowing people “to have some fun.”

Given xAI is a startup, they’re able to release an unfiltered model as they aren’t as liable to consequences as companies like Google, which is a publicly listed company and much larger.

Perhaps most immediately, Grok isn’t the only source of misleading AI images. Open-source tools like Stable Diffusion can be modified to create a wide range of content with few restrictions. However, it’s uncommon for a major tech company to use this approach with an online chatbot—Google even halted Gemini’s image generation feature altogether after an embarrassing attempt to overcorrect for racial and gender stereotypes.

What next?

The text accompanying some AI-generated images suggests that Grok is integrated with FLUX.1, a new diffusion model developed by Black Forest Labs, an AI startup founded by ex-Stability AI engineers.

xAI announced plans to make the latest versions of Grok available to developers through its API in the coming weeks. The company also aims to launch Grok-2 and Grok-2 mini on X to enhance search capabilities, post analytics, and reply functions.

While with Imagen 3, safety protocols aside, Google says it has greater versatility and understanding of prompts, higher quality images, and better text rendering—text being a pesky ongoing problem for all AI image models.

📣 Want to advertise in AIM? Book here

Tarunya S

As a passionate enthusiast of caffeine and journalism, I transform tech into words. I enjoy mountain hikes as much as binge-watching new Netflix series.