several types of attention modules written in PyTorch for learning purposes
-
Updated
Jan 2, 2026 - Python
several types of attention modules written in PyTorch for learning purposes
Image Captioning With MobileNet-LLaMA 3
Examine cost-effective methods for optimizing GQA configurations, comparing the performance with its counterparts like Multi-Head Attention (MHA) and Multi-Query Attention (MQA).
(Unofficial) building Hugging Face SmolLM-blazingly fast and small language model with PyTorch implementation of grouped query attention (GQA)
Transformer Models for Humorous Text Generation. Fine-tuned on Russian jokes dataset with ALiBi, RoPE, GQA, and SwiGLU.Plus a custom Byte-level BPE tokenizer.
Criando um modelo Transformer do zero com Positional Encoding / Posições treináveis, MultiHead Attention, KV Cache e Grouped Attention com alguns livros brasileiros.
A single-file implementation of LLaMA 3, with support for jitting, KV caching and prompting
my llama3 implementation
Criando um modelo Transformer do zero com variações como Multi-Head Attention e Grouped Query Attention em livros de Machado de Assis.
This repository shows how to build a DeepSeek language model from scratch using PyTorch. It includes clean, well-structured implementations of advanced attention techniques such as key–value caching for fast decoding, multi-query attention, grouped-query attention, and multi-head latent attention.
Decoder-only LLM trained on the Harry Potter books.
Add a description, image, and links to the grouped-query-attention topic page so that developers can more easily learn about it.
To associate your repository with the grouped-query-attention topic, visit your repo's landing page and select "manage topics."