OpenAI chief Sam Altman has hinted in a cryptic post that the AI startup is working on a project known internally as “Project Strawberry.” On X, Altman shared a post saying, “I love summer in the garden,” accompanied by an image of a pot with strawberries.
Project Strawberry, also referred to as Q*, was recently revealed in a Reuters report, which said that it will significantly enhance the reasoning capabilities of OpenAI’s AI models. “Some at OpenAI believe Q* could be a breakthrough in the startup’s search for artificial general intelligence (AGI),” said the report.
Project Strawberry involves a novel approach that allows AI models to plan ahead and navigate the internet autonomously to perform in-depth research. This advancement could address current limitations in AI reasoning, such as common sense problems and logical fallacies, which often lead to inaccurate outputs.
AI Insider who goes by the name Jimmy Apples recently revealed that the Q* hasn’t been released yet as they ( OpenAI) aren’t happy with the latency and other ‘little things’ they want to further optimise.
OpenAI’s teams are working on Strawberry to improve the models’ ability to perform long-horizon tasks (LHT), which require planning and executing a series of actions over an extended period.
The project involves a specialised “post-training” phase, adapting the base models for enhanced performance. This method resembles Stanford’s 2022 “Self-Taught Reasoner” (STaR), which enables AI to iteratively create its own training data to reach higher intelligence levels.
OpenAI recently announced DevDay 2024, a global developer event series scheduled to take place in San Francisco on October 1, London on October 30, and Singapore on November 21. While the company has stated that the focus will be on advancements in the API and developer tools, there is speculation that OpenAI might also preview its next frontier model.
Recently, a new model in the LMsys chatbot arena showed strong performance in math. Interestingly, before the release of GPT-4o and GPT-4o Mini, these models were also observed in the chatbot arena a few days earlier.
The internal document indicates that Project Strawberry includes a “deep-research” dataset for training and evaluating the models, though the contents of this dataset remain undisclosed.
This innovation is expected to enable AI to conduct research autonomously, using a “computer-using agent” (CUA) to take actions based on its findings. Additionally, OpenAI plans to test Strawberry’s capabilities in performing tasks typically done by software and machine learning engineers.
Last year, it was reported that Jakub Pachocki and Szymon Sidor, two leading OpenAI researchers, used Ilya Sutskever’s work to develop a model called Q* (pronounced “Q-Star”) that achieved an important milestone by solving math problems it had not previously encountered.
Sutskever, raised concerns among some staff that the company didn’t have proper safeguards in place to commercialise such advanced AI models. Notably, he left OpenAI and recently founded his own company called Safe Superintelligence. Following that Pachocki was appointed as the new chief AI scientist.
What is Q*?
Q* is probably a combination of Q-learning and A* search. OpenAI’s Q* algorithm is considered a breakthrough in AI research, particularly in the development of AI systems with human reasoning capabilities. Q* combines elements of Q-learning and A* (A-star search), which leads to an improvement in goal-oriented thinking and solution finding.
This algorithm shows impressive capabilities in solving complex mathematical problems (without prior training data) and symbolizes an evolution towards general artificial intelligence (AGI).
Q-learning is a foundational concept in the field of AI, specifically in the area of reinforcement learning. Q-learning’s algorithm is categorised as model-free reinforcement learning, and is designed to understand the value of an action within a specific state.
The ultimate goal of Q-learning is to find an optimal policy that defines the best action to take in each state, maximising the cumulative reward over time.
Q-learning is based on the notion of a Q-function, aka the state-action value function. This function operates with two inputs: a state and an action. It returns an estimate of the total reward expected, starting from that state, alongside taking that action, and thereafter following the optimal policy.
OpenAI has recently unveiled a five-level classification system to track progress towards achieving artificial general intelligence (AGI) and superintelligent AI. The company currently considers itself at Level 1 and anticipates reaching Level 2 in the near future.
Other tech giants like Google, Meta, and Microsoft are also exploring techniques to enhance AI reasoning. However, experts like Meta’s Yann LeCun argue that large language models may not yet be capable of human-like reasoning.