Flagship smartphones have long been known for their outstanding cameras, displays and battery life. But now, a groundbreaking feature defines the latest flagship models—the ability to run large language models (LLMs) locally on the device.
One good example of this pro feature is the recent announcement of iOS 18, where Apple unveiled Apple Intelligence for devices running A17 pro (their flagship SoC). Similarly, Samsung announced Galaxy AI for their flagship smartphones last year.
A recent report published by Canalys suggests that by the end of 2024, 16% of new smartphones shipped will be generative AI-capable, with this figure expected to rise to 54% by 2028.
Jensen Huang, the CEO of NVIDIA, mentioned that with this, you wouldn’t need to search online for information much, for it will be generated locally on your mobile phone, saving a lot of energy and time.
Hardware Advancements are on the Way
It’s true that running LLMs locally requires capable hardware. To address this problem, developers came up with SLMs (small language models), which are a stripped-down version of the LLM, requiring fewer resources.
But even after stripping down parameters, you still need a flagship hardware to run LLMs locally. For example, the MLC Chat app, which happens to be the easiest way to run LLM on smartphones, is only available for Samsung S23 with Snapdragon 8 Gen 2 chip, a flagship device from Samsung.
A user used Mixtral 8x7B at 11 tokens/s on a mobile phone through PowerInfer-2, but it was possible when the operation was executed on a OnePlus with 24GB of RAM.
This simply means you need capable hardware to run LLMs locally on your smartphone. And this means it is a flagship feature and will only be available for top-notch products for some time until more SoCs like Qualcomm Snapdragon 7+ Gen 3 become available.
It is a midrange SoC powered by Qualcomm Hexagon NPU to deliver improved AI performance and advertised to run GenAI capabilities locally with less latency compared to its predecessors.
On the other hand, flagship SoCs are getting more powerful. A recent demonstration of MediaTek Dimensity 9300, a flagship offering from MediaTek, was able to execute text-to-video generation, real-time GenAI animation and a lot more.
Furthermore, the MediaTek Dimensity 9300 will feature a software stack optimised to run Meta’s Llama 2. It was possible because APU 790 AI processor is integrated into the Dimensity 9300 to significantly improve generative AI performance and energy efficiency for faster and more secure edge computing.
To power the next generation of AI-powered smartphones, Arm recently unveiled Cortex-X925 and Cortex-A725 for sustained performance, allowing smartphones to handle a wide range of AI tasks efficiently. By enabling more powerful and efficient on-device AI processing, these CPUs reduce the need for cloud-based AI computations.
Why Running LLMs Locally is Important?
Privacy.
Cloud-based LLMs require data to be sent to external servers for processing, which increases the risk of data breaches and unauthorised access. According to a report by HiddenLayer, 77% of businesses experienced breaches in their AI systems over the past year.
Apparently, breaches via third-party vendors increase data exposure risks by over 15% on average. This is particularly concerning given that AI models often handle vast amounts of sensitive information, such as personal identifiers, financial data, and proprietary business information.
All of this can be prevented by running LLMs locally as the data will never leave your device. Sure, the performance may vary based on the hardware. And sure, there are ways you can stack multiple devices into one to run large models but it is not the most efficient way for the general audience.