Local and mobile AI image generation could be the wave of the future.
Most of us have dabbled a bit in creating generative images by now. We enter a weird little prompt into a text field, and an online platform like Stable Diffusion, Midjourney, or DALL-E spits out something cool, bizarre, or both. One thing all of these platforms have in common is the need for an online connection. What if there was a way to create a similar, perhaps better, generative AI image with just the phone in hand and no internet or cloud connection? Qualcomm thinks it has a solution in ControlNet with a less than scary name.
Introduced this week at the Computer Vision and Pattern Recognition Conference (CVPR) in Vancouver, Canada, ControlNet is a new mobile AI imaging model that has two major appealing advantages: 1) The model is local, so ControlNet can run on nearly all platforms without the need for an online connection. 2) Instead of just using text to generate an AI image, ControlNet starts with a user-provided seed image and then manipulates it based on a text prompt.
In some ways, this is similar to Adobe’s Firefly AI, which can generate AI portions to enhance existing images. However, even that model needs an online connection to work.
Introducing this open source model, which is partly based on Stable Diffusion but adds half a billion more parameters to the model’s existing one billion and can be freely used by third-party companies, is not pure altruism.
Sure, ControlNet could conceivably run on Windows, Mac, iOS, and Android, but it won’t be that fast unless it runs on Qualcomm’s Snapdragon platform, and specifically the Hexagon digital signal processor (DSP) on the Snapdragon 8 Gen 2 like that of the Samsung Galaxy S23 Ultra.
In the demos I’ve seen, ControlNet has been able to transform a boring image of an office space into a 70s theme complete with orange walls, and then turn the streets of Barcelona into flowing canals. The office image was stunning in its fidelity. The one in Barcelona looked like the work of a feverish Van Gogh.
ControlNet does its job by taking the basic shapes and textures it finds in images and drawing around them, however the speed and quality of the output means that 3rd party hardware and software developers are sure to take an interest. Mainly because of the obvious advantages of local computing (something Apple already famously favors for much of its AI work).
ControlNet doesn’t necessarily need complete photos to generate new or modified images. Even a rough sketch and text suggestion can produce something interesting and perhaps useful. In a demo image provided by Qualcomm, they show a rough sketch of a kitten transformed into a surrealist cat that still somehow resembles the original drawing.
With local AI generation, your original seed image isn’t pushed to the cloud, nor is the prompt shared with third parties or stored on remote servers. It is, as most privacy advocates would prefer, a closed loop.
Qualcomm is distributing the ControlNet SDKs to developers who want to start programming and testing on Hexagon. As for who might unveil ControlNet-based products in the future, it’s hard to say. Qualcomm won’t because it doesn’t sell anything directly to consumers.
Longtime partner Samsung, however, is a real possibility. Imagine the Samsung Galaxy S24 or S25 Ultra with a native ControlNet-based app. Or, perhaps, Samsung integrates it directly into its photo or camera app. For what it’s worth, the demo I saw was running on a Samsung Galaxy S23 Ultra.
#bad #sketch #artistic #phone