ROCm

  • ROCm on ROG Ally X: Innovation or Overreach?


    ROCm running on a ROG Ally X handheldThe exploration of running ROCm, a software platform for high-performance computing, on a ROG Ally X handheld device raises questions about the practicality and necessity of such an endeavor. While the technical feasibility of implementing ROCm on this gaming handheld is intriguing, it prompts a reflection on the actual benefits and potential drawbacks of doing so. The challenge lies in balancing the excitement of pushing technological boundaries with the practical considerations of usability and performance in a handheld gaming context. This matters because it highlights the importance of aligning technological advancements with user needs and device capabilities.

    Read Full Article: ROCm on ROG Ally X: Innovation or Overreach?

  • AMD iGPUs Use 128GB Memory on Linux via GTT


    TIL you can allocate 128 GB of unified memory to normal AMD iGPUs on Linux via GTTAMD's integrated GPUs (iGPUs) on Linux can leverage up to 128 GB of system memory as VRAM through a feature called Graphics Translation Table (GTT). This dynamic allocation allows developers to utilize iGPUs for tasks like kernel optimization without impacting the CPU's memory pool until needed. While iGPUs are slower for inference tasks, they offer a cost-effective solution for development and profiling, especially when used alongside a main GPU. This capability is particularly beneficial for those working on hybrid CPU/GPU architectures, enabling efficient memory management and development of large memory AMD GPU kernels. This matters because it opens up new possibilities for affordable and efficient computational development on standard hardware.

    Read Full Article: AMD iGPUs Use 128GB Memory on Linux via GTT

  • 7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks


    7900 XTX + ROCm: A Year Later. Llama.cpp vs vLLM Benchmarks (TB3 eGPU)After a year of using the 7900 XTX with ROCm, improvements have been noted, though the experience remains less seamless compared to NVIDIA cards. A comparison of llama.cpp and vLLM benchmarks on this hardware, connected via Thunderbolt 3, reveals varying performance with different models, all fitting within VRAM to mitigate bandwidth limitations. Llama.cpp shows a range of generation speeds from 22.95 t/s to 87.09 t/s, while vLLM demonstrates speeds from 14.99 t/s to 94.19 t/s, highlighting the ongoing challenges and progress in running newer models on AMD hardware. This matters as it provides insight into the current capabilities and limitations of AMD GPUs for local machine learning tasks.

    Read Full Article: 7900 XTX + ROCm: Llama.cpp vs vLLM Benchmarks

  • Optimizing 6700XT GPU with ROCm and Openweb UI


    For those with a 6700XT GPU (gfx1031) - ROCM - Openweb UIFor those using a 6700XT GPU and looking to optimize their setup with ROCm and Openweb UI, a custom configuration has been shared that leverages Google Studio AI for system building. The setup requires Python 3.12.x for ROCm, with Text Generation using ROCm 7.1.1 and Imagery ROCBlas utilizing version 6.4.2. The system is configured to automatically start services on boot with batch files, running them in the background for easy access via Openweb UI. This approach avoids Docker to conserve resources and achieves a performance of 22-25 t/s on ministral3-14b-instruct Q5_XL with a 16k context, with additional success in running Stablediffusion.cpp using a similar custom build. Sharing this configuration could assist others in achieving similar performance gains. This matters because it provides a practical guide for optimizing GPU setups for specific tasks, potentially improving performance and efficiency for users with similar hardware.

    Read Full Article: Optimizing 6700XT GPU with ROCm and Openweb UI