The Power of Efficiency: Edge Al’s Role in Sustainable Generative Al Adoption
When it comes to Generative AI, power efficiency matters. Whether it’s carbon emissions from data center compute, often referred to as the cloud, inference or client battery for mobile, efficiency can not be ignored. According to Scientific American, “NVIDIA is forecasted to ship 1.5 million AI server units per year by 2027. These 1.5 million servers, running at full capacity, would consume at least 85.4 terawatt-hours of electricity annually—more than what many small countries use in a year, according to the new assessment.” Efficiency is going to be the key to scaling AI affordably while minimizing environmental risk.
Last November, Carnegie Mellon and Hugging Face worked together to publish a study tracking CO2 emissions based on energy usage for different AI models. They used tools like CodeCarbon, ML CO2, and LLM Carbon to estimate power consumption of the CPU, GPU, and RAM needed for inference of different ML models based on regional carbon intensity of electricity. The most power-intensive of these models was image generation models, like Stable Diffusion.
According to the Power Hungry Processing: Watts Driving the Cost of AI Deployment? study, the mean energy consumption of image generation models in kWh per 1000 inference queries – generations in this case was 2.907 with a standard deviation of 3.31. The image generation energy usage for Stable Diffusion v1.5 from RunwayML as stated in the study is 1.38 kWh of energy usage per 1000 generations.
Being able to analyze the power consumption of these models running in the cloud raised an interesting question: Is it more efficient to run Gen AI inference on the edge compared to the data center? There is an argument to be made that running many generative AI models on local hardware will be more energy and cost efficient. Considering Stable Diffusion has been one of the prime examples of local Generative AI for nearly a year, we decided to test this thesis.
In order to test data center inference vs. local inference, we put two Qualcomm systems against this. Snapdragon 8 Gen 3 and Snapdragon X Elite, both running Stable Diffusion v1.5 inference on the NPU.
For Snapdragon 8 Gen 3, we used a Xiaomi 14 Pro smartphone and ran SD1.5 in the background with the display off and device unplugged, running it from 100% battery until it died. We repeated this test three times, averaging 467 generations per full charge. With a capacity of 21.06 Wh, the phone was able to generate 22.17 images per Wh of power. With the data center using 1,380 Wh per 1000 generations, we get a simple 0.74 generations per Wh of power. This makes Snapdragon 8 Gen 3 30.60x more efficient on a per Wh basis than data center generation.
For Snapdragon X Elite, we took a similar approach. Using a Snapdragon X Elite reference design laptop with a 58.84 Wh battery capacity, we ran Stable Diffusion 1.5 on NPU on the machine from 100% to 0% battery capacity. Over three full charge cycles, the laptop averaged 1,185 generations, and 20.14 generations per Wh of power. This makes Snapdragon X Elite 27.79x more efficient on a per Wh basis than data center generations.
Why this matters
These results clearly show how much more efficient local NPU inference is compared to data centers using GPU. This is important to note for the environment. The study mentions a few striking comparisons for efficiency of these models like “the least efficient image generation model uses as much energy as 950 smartphone charges (11.49 kWh) [per 1000 generations], or nearly 1 charge per image generation” and “the most carbon-intensive image generation model (stable-diffusion-xl-base-1.0) generates 1,594 grams of𝐶𝑂2 for 1,000 inferences, which is roughly the equivalent to 4.1 miles driven by an average gasoline-powered passenger vehicle.”
The amount of carbon output by these AI models is not insignificant and as data centers scale to larger and higher quality models as well as to support higher demand in inference, energy usage and carbon generation will exponentially increase. Economists have been estimating that data centers powering Gen AI could reach anywhere from 3-4% of the USA’s power grid in the next 5 years. Furthermore on this point, limitations of the grid are one of the main concerns that could be a road block to the rapid adoption of generative AI. A good way to limit this is edge AI, specifically on NPU which is designed to be more efficient for AI inference.
I know it’s been said a few times, but the NPU story is just getting started and it’s important in enterprise and consumer GenAI will only grow with time.