advanced AI
-
Qwen3-30B Model Runs on Raspberry Pi in Real Time
Read Full Article: Qwen3-30B Model Runs on Raspberry Pi in Real Time
The ShapeLearn GGUF release introduces the Qwen3-30B-A3B-Instruct-2507 model, which runs efficiently on small hardware like a Raspberry Pi 5 with 16GB RAM, achieving 8.03 tokens per second while maintaining 94.18% of BF16 quality. Instead of focusing solely on reducing model size, the approach optimizes for tokens per second (TPS) without sacrificing output quality, revealing that different quantization formats impact performance differently on CPUs and GPUs. On CPUs, smaller models generally run faster, while on GPUs, performance is influenced by kernel choices, with certain configurations offering optimal results. Feedback and testing from the community are encouraged to further refine evaluation processes and adapt the model for various setups and workloads. This matters because it demonstrates the potential for advanced AI models to run efficiently on consumer-grade hardware, broadening accessibility and application possibilities.
-
LFM2 2.6B-Exp: AI on Android with 40+ TPS
Read Full Article: LFM2 2.6B-Exp: AI on Android with 40+ TPS
LiquidAI's LFM2 2.6B-Exp model showcases impressive performance, rivaling GPT-4 across various benchmarks and supporting advanced reasoning capabilities. Its hybrid design, combining gated convolutions and grouped query attention, results in a minimal KV cache footprint, allowing for efficient, high-speed, and long-context local inference on mobile devices. Users can access the model through cloud services or locally by downloading it from platforms like Hugging Face and using applications such as "PocketPal AI" or "Maid" on Android. The model's efficient design and recommended sampler settings enable effective reasoning, making sophisticated AI accessible on mobile platforms. This matters because it democratizes access to advanced AI capabilities, enabling more people to leverage powerful tools directly from their smartphones.
