Q4_K_M

  • DeepSeek V3.2: Dense Attention Model


    DeepSeek V3.2 with dense attention (disabled lightning attention) GGUF availableDeepSeek V3.2 with dense attention is now available for use on regular llama.cpp builds without requiring extra support. The model is compatible with Q8_0 and Q4_K_M quantization levels and can be run using a specific jinja template. Performance testing using the lineage-bench on Q4_K_M quant showed impressive results, with the model making only two errors at the most challenging graph size of 128, outperforming the original version with sparse attention. Disabling sparse attention does not seem to negatively impact the model's intelligence, offering a robust alternative for users. This matters because it highlights advancements in model efficiency and usability, allowing for broader application without sacrificing performance.

    Read Full Article: DeepSeek V3.2: Dense Attention Model