dense attention
-
DeepSeek V3.2: Dense Attention Model
Read Full Article: DeepSeek V3.2: Dense Attention Model
DeepSeek V3.2 with dense attention is now available for use on regular llama.cpp builds without requiring extra support. The model is compatible with Q8_0 and Q4_K_M quantization levels and can be run using a specific jinja template. Performance testing using the lineage-bench on Q4_K_M quant showed impressive results, with the model making only two errors at the most challenging graph size of 128, outperforming the original version with sparse attention. Disabling sparse attention does not seem to negatively impact the model's intelligence, offering a robust alternative for users. This matters because it highlights advancements in model efficiency and usability, allowing for broader application without sacrificing performance.
-
Exploring DeepSeek V3.2 with Dense Attention
Read Full Article: Exploring DeepSeek V3.2 with Dense Attention
DeepSeek V3.2 was tested with dense attention instead of its usual sparse attention, using a patch to convert and run the model with llama.cpp. This involved overriding certain tokenizer settings and skipping unsupported tensors. Despite the lack of a jinja chat template for DeepSeek V3.2, the model was successfully run using a saved template from DeepSeek V3. The AI assistant demonstrated its capabilities by engaging in a conversation and solving a multiplication problem step-by-step, showcasing its proficiency in handling text-based tasks. This matters because it explores the adaptability of AI models to different configurations, potentially broadening their usability and functionality.
