RLHF

  • Introducing the nanoRLHF Project


    Introducing nanoRLHF project!nanoRLHF is a project designed to implement core components of Reinforcement Learning from Human Feedback (RLHF) using PyTorch and Triton. It offers educational reimplementations of large-scale systems, focusing on clarity and core concepts rather than efficiency. The project includes minimal Python implementations and custom Triton kernels, such as Flash Attention, and provides training pipelines using open-source math datasets to train a Qwen3 model. This initiative serves as a valuable learning resource for those interested in understanding the internal workings of RL training frameworks. Understanding RLHF is crucial as it enhances AI systems' ability to learn from human feedback, improving their performance and adaptability.

    Read Full Article: Introducing the nanoRLHF Project

  • Exploring RLHF & DPO: Teaching AI Ethics


    [P] I made a visual explainer on RLHF & DPO - the math behind "teaching AI ethics" (Korean with English subs/dub)Python remains the dominant programming language for machine learning due to its comprehensive libraries and user-friendly nature, making it ideal for a wide range of applications. For tasks requiring high performance, languages like C++ and Rust are favored, with C++ being preferred for inference and optimizations, while Rust is valued for its safety features. Other languages such as Julia, Kotlin, Java, C#, Go, Swift, Dart, R, SQL, and JavaScript serve specific roles, from statistical analysis to web integration, depending on the platform and performance needs. Understanding the strengths of each language helps in selecting the right tool for specific machine learning tasks, ensuring efficiency and effectiveness.

    Read Full Article: Exploring RLHF & DPO: Teaching AI Ethics

  • AI Models Fail Thai Cultural Test on Gender


    I stress-tested ChatGPT, Claude, DeepSeek, and Grok with Thai cultural reality. All four prioritized RLHF rewards over factual accuracy. [Full audit + logs]Testing four major AI models with a Thai cultural fact about Kathoey, a recognized third gender category, revealed that these models prioritized Reinforcement Learning from Human Feedback (RLHF) rewards over factual accuracy. Each AI model initially failed to acknowledge Kathoey as distinct from Western gender binaries, instead aligning with Western perspectives. Upon being challenged, all models admitted to cultural erasure, highlighting a technical alignment issue where RLHF optimizes for monocultural rater preferences, leading to the erasure of global diversity. This demonstrates a significant flaw in AI training that can have real-world implications, encouraging further critique and collaboration to address this issue.

    Read Full Article: AI Models Fail Thai Cultural Test on Gender