language models

  • Open Source Code for Refusal Steering Paper Released


    An open source implementation of that refusal steering paperThe release of an open-source code for the refusal steering paper introduces a method for surgical refusal removal using statistical validation rather than intuition-based steering. Key features include judge scores for validating training data, automatic selection of optimal layers through correlation analysis, and confidence-weighted steering vectors. The implementation also offers auto alpha optimization with early stopping and the ability to merge changes permanently into model weights. Although it requires a more complex setup than simpler steering repositories, it provides robust statistical validation at each step, enhancing reliability and precision in machine learning models. This matters because it advances the precision and reliability of machine learning model adjustments, reducing reliance on guesswork.

    Read Full Article: Open Source Code for Refusal Steering Paper Released

  • Hierarchical LLM Decoding for Efficiency


    Idea: Hierarchical LLM Decoding: Let Small Models Generate, Large Models Intervene Only When NeededThe proposal suggests a hierarchical decoding architecture for language models, where smaller models handle most token generation, while larger models intervene only when necessary. This approach aims to reduce latency, energy consumption, and costs associated with using large models for every token, by having them act as supervisors that monitor for errors or critical reasoning steps. The system could involve a Mixture-of-Experts (MoE) architecture, where a gating mechanism determines when the large model should step in. This method promises lower inference latency, reduced energy consumption, and a better cost-quality tradeoff while maintaining reasoning quality. It raises questions about the best signals for intervention and how to prevent over-reliance on the larger model. This matters because it offers a more efficient way to scale language models without compromising performance on reasoning tasks.

    Read Full Article: Hierarchical LLM Decoding for Efficiency

  • Fine-tuning LM for Browser Control with GRPO


    Fine-tuning a Small LM for browser control with GRPO and OpenEnvFine-tuning a small language model (LM) for browser control involves using reinforcement learning techniques to teach the model how to navigate websites and perform tasks such as clicking buttons, filling forms, and booking flights. This process leverages tools like GRPO, BrowserGym, and LFM2-350M to create a training pipeline that starts with basic tasks and progressively scales in complexity. The approach focuses on learning through trial and error rather than relying on perfect demonstrations, allowing the model to develop practical skills for interacting with web environments. This matters because it opens up possibilities for automating complex web tasks, enhancing efficiency and accessibility in digital interactions.

    Read Full Article: Fine-tuning LM for Browser Control with GRPO

  • Exploring Llama 3.2 3B’s Hidden Dimensions


    Llama 3.2 3B fMRI (updated findings)A local interpretability tool has been developed to visualize and intervene in the hidden-state activity of the Llama 3.2 3B model during inference, revealing a persistent hidden dimension (dim 3039) that influences the model's commitment to its generative trajectory. Systematic tests across various prompt types and intervention conditions showed that increasing intervention magnitude led to more confident responses, though not necessarily more accurate ones. This dimension acts as a global commitment gain, affecting how strongly the model adheres to its chosen path without altering which path is selected. The findings suggest that magnitude of intervention is more impactful than direction, with significant implications for understanding model behavior and improving interpretability. This matters because it sheds light on how AI models make decisions and the factors influencing their confidence, which is crucial for developing more reliable AI systems.

    Read Full Article: Exploring Llama 3.2 3B’s Hidden Dimensions

  • Activation Functions in Language Models


    Day 20: 21 Days of Building a Small Language Model: Activation FunctionsActivation functions are crucial components in neural networks, enabling them to learn complex, non-linear patterns beyond simple linear transformations. They introduce non-linearity, allowing networks to approximate any function, which is essential for tasks like image recognition and language understanding. The evolution of activation functions has moved from ReLU, which helped overcome vanishing gradients, to more sophisticated functions like GELU and SwiGLU, which offer smoother transitions and better gradient flow. SwiGLU, with its gating mechanism, has become the standard in modern language models due to its expressiveness and ability to improve training stability and model performance. Understanding and choosing the right activation function is vital for building effective and stable language models. Why this matters: Activation functions are fundamental to the performance and stability of neural networks, impacting their ability to learn and generalize complex patterns in data.

    Read Full Article: Activation Functions in Language Models

  • Exploring Language Model Quirks with Em Dashes


    Never thought it was this easy to break itExperimenting with language models can lead to unexpected and amusing results, as demonstrated by a user who discovered a peculiar behavior when prompting a model to generate text with excessive em dashes. By instructing the model to replace all em dashes with words and vice versa, the user observed that the model would enter a loop of generating em dashes until manually stopped. This highlights the quirky and sometimes unpredictable nature of language models when given unconventional prompts, showcasing both their creative potential and limitations. Understanding these behaviors is crucial for refining AI interactions and improving user experiences.

    Read Full Article: Exploring Language Model Quirks with Em Dashes

  • ModelCypher: Exploring LLM Geometry


    ModelCypher: A toolkit for the geometry of LLMs (open source) [P]ModelCypher is an open-source toolkit designed to explore the geometry of small language models, challenging the notion that these models are inherently black boxes. It features cross-architecture adapter transfer and jailbreak detection using entropy divergence, implementing methods from over 46 recent research papers. Although the hypothesis that Wierzbicka's "Semantic Primes" would show unique geometric invariance was disproven, the toolkit reveals that distinct concepts have a high convergence across different models. The tools are documented with analogies to aid understanding, though they primarily provide raw metrics rather than user-friendly outputs. This matters because it provides a new way to understand and potentially improve language models by examining their geometric properties.

    Read Full Article: ModelCypher: Exploring LLM Geometry

  • DS-STAR: Versatile Data Science Agent


    DS-STAR: A state-of-the-art versatile data science agentDS-STAR is a cutting-edge data science agent designed to enhance performance through its versatile components. Ablation studies highlight the importance of its Data File Analyzer, which significantly improves accuracy by providing detailed data context, as evidenced by a sharp drop in performance when this component is removed. The Router agent is crucial for determining when to add or correct steps, preventing the accumulation of flawed steps and ensuring efficient planning. Additionally, DS-STAR demonstrates adaptability across different language models, with tests using GPT-5 showing promising results, particularly on easier tasks, while the Gemini-2.5-Pro version excels in handling more complex challenges. This matters because it showcases the potential for advanced data science agents to improve task performance across various complexities and models.

    Read Full Article: DS-STAR: Versatile Data Science Agent

  • Hosting Language Models on a Budget


    Hosting Language Models on a BudgetRunning your own large language model (LLM) can be surprisingly affordable and straightforward, with options like deploying TinyLlama on Hugging Face for free. Understanding the costs involved, such as compute, storage, and bandwidth, is crucial, as compute is typically the largest expense. For beginners or those with limited budgets, free hosting options like Hugging Face Spaces, Render, and Railway can be utilized effectively. Models like TinyLlama, DistilGPT-2, Phi-2, and Flan-T5-Small are suitable for various tasks and can be run on free tiers, providing a practical way to experiment and learn without significant financial investment. This matters because it democratizes access to advanced AI technology, enabling more people to experiment and innovate without prohibitive costs.

    Read Full Article: Hosting Language Models on a Budget

  • Linguistic Bias in ChatGPT: Dialect Discrimination


    Linguistic Bias in ChatGPT: Language Models Reinforce Dialect DiscriminationChatGPT exhibits linguistic biases that reinforce dialect discrimination by favoring Standard American English over non-"standard" varieties like Indian, Nigerian, and African-American English. Despite being used globally, the model's responses often default to American conventions, frustrating non-American users and perpetuating stereotypes and demeaning content. Studies show that ChatGPT's responses to non-"standard" varieties are rated worse in terms of stereotyping, comprehension, and naturalness compared to "standard" varieties. These biases can exacerbate existing inequalities and power dynamics, making it harder for speakers of non-"standard" English to effectively use AI tools. This matters because as AI becomes more integrated into daily life, it risks reinforcing societal biases against minoritized language communities.

    Read Full Article: Linguistic Bias in ChatGPT: Dialect Discrimination