Running local language models on a Raspberry Pi 5 now has another path that does not route through Hailo silicon. M5Stack's LLM-8850 Kit pairs an M.2 accelerator card with a PiHat adapter, plugging into the Pi 5's PCIe expansion connector to add a 24 TOPS INT8 NPU for on-device inference, video analytics, and multimodal model workloads.

At the center of the card is the Axera AX8850, a SoC that combines eight Arm Cortex-A55 cores running at up to 1.7GHz with the 24 TOPS NPU, dual VDSP blocks, a CV engine, and hardware H.264/H.265 codec units. The 4GB card carries 64-bit LPDDR4x memory clocked at 4266Mbps plus 32Mbits of QSPI NOR flash for the bootloader, and it talks to the host over PCIe 2.0 x2 in an M.2 M-Key 2242 form factor. On the video side the kit lists 8K/30fps H.264/H.265 encode and decode with scaling and cropping, up to 16 channels of 1080p parallel decode, and simultaneous encode, decode, and transcode.

For open-source builders, the appeal is the tooling stack rather than the raw numbers. Axera publishes its ax-llm and AXCL projects on GitHub for deploying CNN, Transformer, CLIP, Whisper, and large-language models on its chips, and the company maintains pre-converted weights on Hugging Face, including Qwen3-1.7B quantized to w8a16 for the NPU. The product listing names Llama 3.2, Qwen3, and InternVL3 among the supported full-stack models, and the AXCL catalog has grown to also include DeepSeek-R1-Distill-Qwen-1.5B, SmolVLM2-500M-Video-Instruct, MiniCPM4-0.5B, audio models covering Whisper, SenseVoice, and CosyVoice2, and a Stable Diffusion 1.5 port. Officially the kit runs on Ubuntu 20.04, 22.04, and 24.04, plus Debian 12 and 13, alongside Windows 10 and 11; macOS, WSL, VMware, and VirtualBox are listed as unsupported.

M5Stack has published Frigate NVR integration documentation that routes the card's multi-channel hardware video decode and NPU through Frigate's open-source surveillance pipeline, giving self-hosted NVR setups a path to local AI object detection without a separate accelerator. The card also appears in Jeff Geerling's raspberry-pi-pcie-devices compatibility tracker, the standard community reference for what works in the Pi 5's PCIe slot, and the AXCL host runtime is not limited to the Pi 5, with the same stack supported on Rockchip RK3588 SBCs and x86 machines with a spare M.2 Key-M socket.

Cooling is an active setup, a micro turbine fan over a CNC aluminum finned heatsink, with an onboard embedded controller managing fan speed through a temperature-current-speed loop. The PiHat adapter draws power over USB-C PD 3.0 at 9V, 12V, or 20V, with a minimum 9V/3A (27W) supply, and feeds 5V/4A to the Pi 5 and 3.3V/6A to the M.2 card; the accelerator itself is rated at 7W. M5Stack warns the system must be powered from the adapter's USB-C input rather than the Pi 5 side to avoid damage. One other constraint: because of the Pi 5's limited PCIe interrupt resources, an M.2 SSD and the LLM-8850 cannot share a four-lane M.2 expansion at the same time.

The 4GB version, SKU AI-002-4G, is priced at $185 (€171). It lands as a direct alternative to Raspberry Pi's own AI HAT+ 2, which launched on 2026-01-15 with a 40 TOPS Hailo-10H and 8GB of dedicated memory at $130, giving Pi 5 owners a second hardware route for self-hosted generative AI.