hallucinations
-
Understanding H-Neurons in LLMs
Read Full Article: Understanding H-Neurons in LLMs
Large language models (LLMs) often produce hallucinations, which are outputs that seem plausible but are factually incorrect, affecting their reliability. A detailed investigation into hallucination-associated neurons (H-Neurons) reveals that a very small fraction of neurons (less than 0.1%) can predict these occurrences reliably across various scenarios. These neurons are causally linked to behaviors of over-compliance and originate from pre-trained base models, maintaining their predictive power for hallucination detection. Understanding these neuron-level mechanisms can help in developing more reliable LLMs by bridging the gap between observable behaviors and underlying neural activity.
-
Hallucinations: Reward System Failure, Not Knowledge
Read Full Article: Hallucinations: Reward System Failure, Not Knowledge
Allucinazioni non sono semplicemente errori di percezione, ma piuttosto un fallimento nel sistema di ricompensa del cervello. Quando il cervello cerca di interpretare segnali ambigui, può generare percezioni errate se i meccanismi di ricompensa non funzionano correttamente. Questo suggerisce che le allucinazioni potrebbero essere affrontate migliorando il modo in cui il cervello valuta e risponde a queste informazioni piuttosto che solo correggendo la conoscenza o la percezione. Comprendere questo meccanismo potrebbe portare a nuovi approcci terapeutici per disturbi mentali associati alle allucinazioni.
-
GPT-5.2: A Shift in Evaluative Personality
Read Full Article: GPT-5.2: A Shift in Evaluative Personality
GPT-5.2 has shifted its focus towards evaluative personality, making it highly distinguishable with a classification accuracy of 97.9%, compared to Claude's family at 83.9%. Interestingly, GPT-5.2 is more stringent on hallucinations and faithfulness, areas where Claude previously excelled, indicating OpenAI's emphasis on grounding accuracy. This has resulted in GPT-5.2 being more aligned with models like Sonnet and Opus 4.5 in terms of strictness, whereas GPT-4.1 is more lenient, similar to Gemini-3-Pro. The changes reflect a strategic move by OpenAI to enhance the reliability and accuracy of their models, which is crucial for applications requiring high trust in AI outputs.
-
Tool Tackles LLM Hallucinations with Evidence Check
Read Full Article: Tool Tackles LLM Hallucinations with Evidence Check
A new tool has been developed to address the issue of hallucinations in large language models (LLMs) by breaking down their responses into atomic claims and retrieving evidence from a limited corpus. This tool compares the model's confidence with the actual support for its claims, flagging cases where there is high confidence but low evidence as epistemic risks rather than making "truth" judgments. The tool operates locally without the need for cloud services, accounts, or API keys, and is designed to be transparent about its limitations. An example of its application is the "Python 3.12 removed the GIL" case, where the tool identifies a high semantic similarity but low logical support, highlighting the potential for epistemic risk. This matters because it provides a method for critically evaluating the reliability of LLM outputs, helping to identify and mitigate the risks of misinformation.
