computational tasks

  • AMD iGPUs Use 128GB Memory on Linux via GTT


    TIL you can allocate 128 GB of unified memory to normal AMD iGPUs on Linux via GTTAMD's integrated GPUs (iGPUs) on Linux can leverage up to 128 GB of system memory as VRAM through a feature called Graphics Translation Table (GTT). This dynamic allocation allows developers to utilize iGPUs for tasks like kernel optimization without impacting the CPU's memory pool until needed. While iGPUs are slower for inference tasks, they offer a cost-effective solution for development and profiling, especially when used alongside a main GPU. This capability is particularly beneficial for those working on hybrid CPU/GPU architectures, enabling efficient memory management and development of large memory AMD GPU kernels. This matters because it opens up new possibilities for affordable and efficient computational development on standard hardware.

    Read Full Article: AMD iGPUs Use 128GB Memory on Linux via GTT

  • RPC-server llama.cpp Benchmarks


    RPC-server llama.cpp benchmarksThe llama.cpp RPC server facilitates distributed inference of large language models (LLMs) by offloading computations to remote instances across multiple machines or GPUs. Benchmarks were conducted on a local gigabit network utilizing three systems and five GPUs, showcasing the server's performance in handling different model sizes and parameters. The systems included a mix of AMD and Intel CPUs, with GPUs such as GTX 1080Ti, Nvidia P102-100, and Radeon RX 7900 GRE, collectively providing a total of 53GB VRAM. Performance tests were conducted on various models, including Nemotron-3-Nano-30B and DeepSeek-R1-Distill-Llama-70B, highlighting the server's capability to efficiently manage complex computations across distributed environments. This matters because it demonstrates the potential for scalable and efficient LLM deployment in distributed computing environments, crucial for advancing AI applications.

    Read Full Article: RPC-server llama.cpp Benchmarks

  • Automated Algorithmic Optimization with AlphaEvolve


    [R] Automated algorithmic optimization (AlphaEvolve)The concept of AlphaEvolve proposes a novel approach to algorithmic optimization by leveraging neural networks to learn a continuous space representing a combinatorial space of algorithms. This involves defining a learnable embedding space where algorithms are mapped using a BERT-like objective, allowing for functional closeness to correspond to Euclidean proximity. The method utilizes a learned mapping to represent performance, transforming algorithm invention into an optimization problem that seeks to maximize performance gains. By steering the activation of a code-generation model, theoretical vectors are decoded into executable code, potentially revolutionizing how algorithms are discovered and optimized. This matters because it could significantly enhance the efficiency and capability of algorithm development, leading to breakthroughs in computational tasks.

    Read Full Article: Automated Algorithmic Optimization with AlphaEvolve