Soon, Running LLMs Locally will be a Pro Feature for Flagship Smartphones

A recent report published by Canalys suggests that by the end of 2024, 16% of new smartphones shipped will be generative AI-capable, with this figure expected to rise to 54% by 2028.

Share

Published on July 11, 2024

by Sagar Sharma

Flagship smartphones have long been known for their outstanding cameras, displays and battery life. But now, a groundbreaking feature defines the latest flagship models—the ability to run large language models (LLMs) locally on the device.

One good example of this pro feature is the recent announcement of iOS 18, where Apple unveiled Apple Intelligence for devices running A17 pro (their flagship SoC). Similarly, Samsung announced Galaxy AI for their flagship smartphones last year.

A recent report published by Canalys suggests that by the end of 2024, 16% of new smartphones shipped will be generative AI-capable, with this figure expected to rise to 54% by 2028.

Jensen Huang, the CEO of NVIDIA, mentioned that with this, you wouldn’t need to search online for information much, for it will be generated locally on your mobile phone, saving a lot of energy and time.

In the near future, your mobile phone will have AI LLM's built into the core operating system.

This means, you will search LESS online for information, it will be generated locally on your mobile phone.

This means traffic for search based content will shift DRAMATICALLY pic.twitter.com/T5ePlNMvrS
— Busy Works Beats (@BusyWorksBeats) July 10, 2024

Hardware Advancements are on the Way

It’s true that running LLMs locally requires capable hardware. To address this problem, developers came up with SLMs (small language models), which are a stripped-down version of the LLM, requiring fewer resources.

But even after stripping down parameters, you still need a flagship hardware to run LLMs locally. For example, the MLC Chat app, which happens to be the easiest way to run LLM on smartphones, is only available for Samsung S23 with Snapdragon 8 Gen 2 chip, a flagship device from Samsung.

A user used Mixtral 8x7B at 11 tokens/s on a mobile phone through PowerInfer-2, but it was possible when the operation was executed on a OnePlus with 24GB of RAM.

Surprised by how many tech enthusiasts view the current AI revolution as merely another technological breakthrough.

We're actually experiencing an unprecedented historical shift that none of us have ever seen.

Here, a one-plus 24GB mobile running a Mixtral 8x7B at 11… pic.twitter.com/v6RlpkEfEn
— Rohan Paul (@rohanpaul_ai) July 3, 2024

This simply means you need capable hardware to run LLMs locally on your smartphone. And this means it is a flagship feature and will only be available for top-notch products for some time until more SoCs like Qualcomm Snapdragon 7+ Gen 3 become available.

It is a midrange SoC powered by Qualcomm Hexagon NPU to deliver improved AI performance and advertised to run GenAI capabilities locally with less latency compared to its predecessors.

On the other hand, flagship SoCs are getting more powerful. A recent demonstration of MediaTek Dimensity 9300, a flagship offering from MediaTek, was able to execute text-to-video generation, real-time GenAI animation and a lot more.

Furthermore, the MediaTek Dimensity 9300 will feature a software stack optimised to run Meta’s Llama 2. It was possible because APU 790 AI processor is integrated into the Dimensity 9300 to significantly improve generative AI performance and energy efficiency for faster and more secure edge computing.

MediaTek Dimensity 9300: Real-time generative AI on your phone! #AI #generativeai #mediatekdimensity9300 pic.twitter.com/92Ya50ABCQ
— Android Authority (@AndroidAuth) July 1, 2024

To power the next generation of AI-powered smartphones, Arm recently unveiled Cortex-X925 and Cortex-A725 for sustained performance, allowing smartphones to handle a wide range of AI tasks efficiently. By enabling more powerful and efficient on-device AI processing, these CPUs reduce the need for cloud-based AI computations.

Why Running LLMs Locally is Important?

Privacy.

Cloud-based LLMs require data to be sent to external servers for processing, which increases the risk of data breaches and unauthorised access. According to a report by HiddenLayer, 77% of businesses experienced breaches in their AI systems over the past year.

Run locally.

Only way to keep your privacy.

Among other advantages. https://t.co/liK3HlsWmg
— Robert Scoble (@Scobleizer) January 24, 2024

Apparently, breaches via third-party vendors increase data exposure risks by over 15% on average. This is particularly concerning given that AI models often handle vast amounts of sensitive information, such as personal identifiers, financial data, and proprietary business information.

All of this can be prevented by running LLMs locally as the data will never leave your device. Sure, the performance may vary based on the hardware. And sure, there are ways you can stack multiple devices into one to run large models but it is not the most efficient way for the general audience.

📣 Want to advertise in AIM? Book here

Sagar Sharma

A software engineer who loves to experiment with new-gen AI. He also happens to love testing hardware and sometimes they crash. While reviving his crashed system, you can find him reading literature, manga, or watering plants.