benchmarks
-
DGX Spark: Discrepancies in Nvidia’s LLM Benchmarks
Read Full Article: DGX Spark: Discrepancies in Nvidia’s LLM Benchmarks
DGX Spark, Nvidia's platform for large language model (LLM) development, has been found to perform significantly slower than Nvidia's advertised benchmarks. While Nvidia claims high token processing speeds using advanced frameworks like Unsloth, real-world tests show much lower performance, suggesting potential discrepancies in Nvidia's reported figures. The tests indicate that Nvidia may be using specialized low precision training methods not commonly accessible, or possibly overstating their benchmarks. This discrepancy is crucial for developers and researchers to consider when planning investments in AI hardware, as it impacts the efficiency and cost-effectiveness of LLM training.
-
7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks
Read Full Article: 7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks
After a year of using the 7900 XTX with ROCm, improvements have been noted, though the experience remains less seamless compared to NVIDIA cards. A comparison of llama.cpp and vLLM benchmarks on this hardware, connected via Thunderbolt 3, reveals varying performance with different models, all fitting within VRAM to mitigate bandwidth limitations. Llama.cpp shows a range of generation speeds from 22.95 t/s to 87.09 t/s, while vLLM demonstrates speeds from 14.99 t/s to 94.19 t/s, highlighting the ongoing challenges and progress in running newer models on AMD hardware. This matters as it provides insight into the current capabilities and limitations of AMD GPUs for local machine learning tasks.
