AMD hardware

7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks

After a year of using the 7900 XTX with ROCm, improvements have been noted, though the experience remains less seamless compared to NVIDIA cards. A comparison of llama.cpp and vLLM benchmarks on this hardware, connected via Thunderbolt 3, reveals varying performance with different models, all fitting within VRAM to mitigate bandwidth limitations. Llama.cpp shows a range of generation speeds from 22.95 t/s to 87.09 t/s, while vLLM demonstrates speeds from 14.99 t/s to 94.19 t/s, highlighting the ongoing challenges and progress in running newer models on AMD hardware. This matters as it provides insight into the current capabilities and limitations of AMD GPUs for local machine learning tasks.
Read Full Article
Read Full Article: 7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks

Posted on

Jan 1, 2026

by

UsefulAI

in

Benchmarking, Commentary

Topics: machine learning, AI development, llama.cpp
Boosting GPU Utilization with WoolyAI’s Software Stack

Traditional GPU job orchestration often leads to underutilization due to the one-job-per-GPU approach, which leaves GPU resources idle when not fully saturated. WoolyAI's software stack addresses this by allowing multiple jobs to run concurrently on a single GPU with deterministic performance, dynamically managing the GPU's streaming multiprocessors (SMs) to ensure full utilization. This approach not only maximizes GPU efficiency but also supports running machine learning jobs on CPU-only infrastructure by executing kernels remotely on a shared GPU pool. Additionally, it allows existing CUDA PyTorch jobs to run seamlessly on AMD hardware without modifications. This matters because it significantly increases GPU utilization and efficiency, potentially reducing costs and improving performance in computational tasks.
Read Full Article
Read Full Article: Boosting GPU Utilization with WoolyAI’s Software Stack

Posted on

Dec 28, 2025

by

TweakedGeek

in

Deep Dives, Tools

Topics: machine learning, GPU efficiency, GPU utilization

AMD hardware

7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks

Boosting GPU Utilization with WoolyAI’s Software Stack

Popular AI Topics

More AI Articles