Since the release of Anthropic’s Claude 3.5 model family, social media platforms, particularly X, have been all about Claude 3.5 Sonnet. Of all its new features, Artifacts is one of the most talked about and makes it far better than OpenAI’s GPT-4o.
It enhances user interaction by providing a dedicated window alongside conversations. It also significantly improves data interpretation and visualisation capabilities, making it easier for users to interact with and understand the generated content.
Claude 3.5 Sonnet Rules
Recently, Swami Sivasubramanian, head of AI services and data at AWS, also spoke about the feature, highlighting Claude 3.5 Sonnet’s strengths for data science and analysis, alongside vision capabilities.
He said, when given access to a coding environment, it produces high-quality statistical visualisation and actionable predictions, ranging from business strategies to real-time product trends.
Further, he said when processing images, particularly interpreting charts and graphs that require visual understanding, Claude 3.5 Sonnet does a pretty good job.
“It can accurately transcribe text from imperfect images—a core capability for industries such as retail, logistics, healthcare, and financial services, where AI may be able to garner more insights from an image, graphic or illustration than from text alone, for use cases like trend analysis, patient triage, and research summaries,” added Sivasubramanian.
AI researcher Razia Aliani also recently experimented by turning research papers into actionable insights, alongside identifying key concepts, visualising relationships, and extracting relevant data. “I made it possible with this AI agent (Claude 3.5 Sonnet). It turns information overload into actionable insights,” she added.
The examples are plenty:
https://x.com/TheAIAdvantage/status/1809236767951708204
What About GPT 4o?
Users on X have praised GPT 4o’s data visualisation capabilities. For instance, Aadit Sheth posted saying, he took less than 30 seconds to create high quality graphs.
In a Reddit post users are sharing their experience on how they absolutely loved the GPT 4o data visualisation capabilities. A user also mentioned how the data visualisation capability works within the same chat session and provides relevant prompt suggestions after each reply.
Even though the users praised the GPT 4o visualisation capabilities, there were limitations mentioned as well.
A user noted that while the feature is available, it’s not always reliable for complex data analysis and visualisation tasks, especially when compared to specialised tools like R or other plotting software.
Both Struggle
YouTube presenter Jordan Wilson compared how Claude 3 and GPT-4 fared at performing data analysis on YouTube channel statistics.
The analysis happened with the data set containing 16,000 cells of information from Wilson’s YouTube channel, including metrics for about 500 videos.
Both AI models were tasked with analysing various aspects of the channel’s performance, such as optimal publish times, content types, and top-performing videos.
Claude and GPT-4 both showed capabilities in data analysis, with some strengths and weaknesses for each. Claude provided more creative and analytical insights for future video strategies, whereas, GPT-4 offered interactive charts and more detailed explanations of its analysis process.
However, both models encountered a few errors or limitations, particularly with complex visualisation requests.
For example, Claude initially had issues with its artifacts feature, requiring a second attempt. Also, GPT-4 faced limitations with certain types of interactive charts, stating “interactive charts of this type are not supported” for some requests.
Furthermore, a research paper by Generative AI Research Lab, showed that in an overall comparison between the two, GPT-4o slightly outperforms Claude-3.5-Sonnet in overall visual reasoning tasks, but the difference is minimal.
Source – Research paper
Benchmarking for Data Interpretation And Visualisation
When it comes to data visualisation, there is no established benchmark for evaluation. However, some research papers, such as VisEval, have specifically developed a benchmark for data evaluation in LLMs.
Some findings from the paper indicate that LLMs struggle with complex visualisations requiring multiple visual channels and that performance decreases with increasing query complexity.
It is also possible that due to lack of specific benchmarks for data visualisation alone, it has not been considered as a factor or rather not taken into account in the Claude 3.5 evaluation research paper.