Microsoft researchers have developed SPREADSHEETLLM, a framework that enables LLMs to effectively process and analyse complex spreadsheet data. The new approach significantly improves performance on spreadsheet understanding tasks while dramatically reducing computational costs.
Key innovations of SPREADSHEETLLM include SHEETCOMPRESSOR, a novel encoding method that compresses spreadsheets by up to 96%, allowing LLMs to handle much larger datasets within token limits. It also features structural anchor extraction, which identifies key rows and columns that define table structures, preserving critical layout information.
Inverted-index translation efficiently encodes cell contents and addresses to minimise redundancy, while data format-aware aggregation groups cells with similar formats to further reduce token usage.
In experiments, SPREADSHEETLLM achieved state-of-the-art results on spreadsheet table detection, outperforming previous methods by 12.3%. It also demonstrated strong capabilities on spreadsheet question-answering tasks.
The researchers tested SPREADSHEETLLM with various LLMs, including GPT-4, GPT-3.5, Llama 2, and others. Fine-tuned versions showed particular promise, with GPT-4 reaching an F1 score of 78.9% on table detection.
Beyond improving performance, SPREADSHEETLLM’s compression techniques reduced processing costs by 96% compared to standard encoding methods.
While some limitations remain, such as handling complex formatting, the framework represents a major step forward in applying LLMs to spreadsheet analysis. The researchers suggest it could enable more intelligent and efficient interactions with spreadsheet data across various applications.
Microsoft Upgrading
Last year, Microsoft Excel introduced a public preview of Python integration, eliminating the need for additional software by bundling built-in connectors and power queries for Python. Similarly, without using third party apps, now the new spreadsheetLLM tool simplifies complex data analysis by leveraging large language models (LLMs), making it easier for users to handle intricate tasks.
Read the full paper here.