Gemma 3

  • Benchmarking Small LLMs on a 16GB Laptop


    I benchmarked 7 Small LLMs on a 16GB Laptop. Here is what is actually usable.Running small language models (LLMs) on a standard 16GB RAM laptop reveals varying levels of usability, with Qwen 2.5 (14B) offering the best coding performance but consuming significant RAM, leading to crashes when multitasking. Mistral Small (12B) provides a balance between speed and resource demand, though it still causes Windows to swap memory aggressively. Llama-3-8B is more manageable but lacks the reasoning abilities of newer models, while Gemma 3 (9B) excels in instruction following but is resource-intensive. With rising RAM prices, upgrading to 32GB allows for smoother operation without swap lag, presenting a more cost-effective solution than investing in high-end GPUs. This matters because understanding the resource requirements of LLMs can help users optimize their systems without overspending on hardware upgrades.

    Read Full Article: Benchmarking Small LLMs on a 16GB Laptop

  • Google’s FunctionGemma: AI for Edge Function Calling


    From Gemma 3 270M to FunctionGemma, How Google AI Built a Compact Function Calling Specialist for Edge WorkloadsGoogle has introduced FunctionGemma, a specialized version of the Gemma 3 270M model, designed specifically for function calling and optimized for edge workloads. FunctionGemma retains the Gemma 3 architecture but focuses on translating natural language into executable API actions rather than general chat. It uses a structured conversation format with control tokens to manage tool definitions and function calls, ensuring reliable tool use in production. The model, trained on 6 trillion tokens, supports a 256K vocabulary optimized for JSON and multilingual text, enhancing token efficiency. FunctionGemma's primary deployment target is edge devices like phones and laptops, benefiting from its compact size and quantization support for low-latency, low-memory inference. Demonstrations such as Mobile Actions and Tiny Garden showcase its ability to perform complex tasks on-device without server calls, achieving up to 85% accuracy after fine-tuning. This development signifies a step forward in creating efficient, localized AI solutions that can operate independently of cloud infrastructure, crucial for privacy and real-time applications.

    Read Full Article: Google’s FunctionGemma: AI for Edge Function Calling