Preview: Tweaked Geek: Practical AI Tech

7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks

After a year of using the 7900 XTX with ROCm, improvements have been noted, though the experience remains less seamless compared to NVIDIA cards. A comparison of llama.cpp and vLLM benchmarks on this hardware, connected via Thunderbolt 3, reveals varying performance with different models, all fitting within VRAM to mitigate bandwidth limitations. Llama.cpp shows a range of generation speeds from 22.95 t/s to 87.09 t/s, while vLLM demonstrates speeds from 14.99 t/s to 94.19 t/s, highlighting the ongoing challenges and progress in running newer models on AMD hardware. This matters as it provides insight into the current capabilities and limitations of AMD GPUs for local machine learning tasks.

Read Full Article

Posted on

Jan 1, 2026

by

UsefulAI

in

Benchmarking, Commentary, Tools

Topics: machine learning, AI development, llama.cpp

Exploring Hidden Dimensions in Llama-3.2-3B

A local interpretability toolchain has been developed to explore the coupling of hidden dimensions in small language models, specifically Llama-3.2-3B-Instruct. By focusing on deterministic decoding and stratified prompts, the toolchain reduces noise and identifies key dimensions that significantly influence model behavior. A causal test revealed that perturbing a critical dimension, DIM 1731, causes a collapse in semantic commitment while maintaining fluency, suggesting its role in decision-stability. This discovery highlights the existence of high-centrality dimensions that are crucial for model functionality and opens pathways for further exploration and replication across models. Understanding these dimensions is essential for improving the reliability and interpretability of AI models.

Read Full Article

Posted on

Jan 1, 2026

by

GeekOptimizer

in

Deep Dives, Learning

Topics: AI reliability, language models, AI research

Semantic Caching for AI and LLMs

Semantic caching is a technique used to enhance the efficiency of AI, large language models (LLMs), and retrieval-augmented generation (RAG) systems by storing and reusing previously computed results. Unlike traditional caching, which relies on exact matching of queries, semantic caching leverages the meaning and context of queries, enabling systems to handle similar or related queries more effectively. This approach reduces computational overhead and improves response times, making it particularly valuable in environments where quick access to information is crucial. Understanding semantic caching is essential for optimizing the performance of AI systems and ensuring they can scale to meet increasing demands.

Posted on

by

in

Topics: AI efficiency, AI systems, LLMs

Public Domain 2026: Iconic Works Set Free

As of 2026, numerous iconic works from 1930 have entered the public domain, allowing for their free use and repurposing in the US. Notable entries include Betty Boop's initial appearance in "Dizzy Dishes" and the early version of Pluto, then known as Rover, in "The Picnic." This transition to the public domain also includes films like "Morocco," which featured content that would later be restricted by the Hays Code. These newly available works provide opportunities for creators to incorporate classic characters and stories into new projects, fostering creativity and innovation. This matters because it opens up a wealth of cultural content for public use, inspiring new creative endeavors and preserving historical media.

Posted on

by

in

Topics: cultural heritage

From Tools to Organisms: AI’s Next Frontier

The ongoing debate in autonomous agents revolves around two main philosophies: the "Black Box" approach, where big tech companies like OpenAI and Google promote trust in their smart models, and the "Glass Box" approach, which offers transparency and auditability. While the Glass Box is celebrated for its openness, it is criticized for being static and reliant on human prompts, lacking true autonomy. The argument is that tools, whether black or glass, cannot achieve real-world autonomy without a system architecture that supports self-creation and dynamic adaptation. The future lies in developing "Living Operating Systems" that operate continuously, self-reproduce, and evolve by integrating successful strategies into their codebase, moving beyond mere tools to create autonomous organisms. This matters because it challenges the current trajectory of AI development and proposes a paradigm shift towards creating truly autonomous systems.