Lumenspark2 is the second-generation implementation of Lumenspark, a lightweight transformer for efficient large-scale language modeling. It integrates modern architectural components, Hugging Face’s training ecosystem, and parameter-efficient fine-tuning methods to provide a flexible, research-friendly framework.
-
🔥 Modern Transformer Architecture
- Rotary Position Embeddings (RoPE)
- RMSNorm normalization
- SwiGLU feed-forward networks
- Efficient scaled-dot-product attention (SDPA)
-
⚡ Training Framework
- Hugging Face
Trainerintegration - Streaming dataset support (FineWeb-Edu)
- Gradient accumulation & mixed precision (
bf16) - Custom callback for loss plots and inline text generation
- Hugging Face
-
🧩 Extensible & Modular
- LoRA adapters for efficient fine-tuning
- Dynamic sequence chunking collator
- Configurable via
LumensparkConfig
-
📊 Evaluation & Monitoring
- Live loss plotting (
training_loss_plot.png) - Text generation evaluation during training
- Parameter counting utility
- Live loss plotting (
Clone and install dependencies:
git clone https://github.com/anto18671/lumenspark2.git
cd lumenspark2
pip install -r requirements.txtDependencies:
torchtransformersdatasetssafetensorshuggingface_hubmatplotlib
-
Config Parameters (LumensparkConfig):
seq_length: 1536d_model: 1024n_layers: 12n_heads: 16ffn_mult: 4.0dropout: 0.1rope_theta: 10,000adapter_rank: 0 (LoRA disabled by default)
-
Core Components:
- Token embeddings (tied with LM head)
- Transformer blocks with RMSNorm + RoPE + SDPA
- SwiGLU feed-forward networks
- Causal LM head
Run training with:
python train.pyDefault hyperparameters:
- Batch size:
8 - Gradient accumulation:
20 - Learning rate:
1e-4 - Weight decay:
1e-2 - Dataset: FineWeb-Edu (streaming)
- Steps:
10,000(viaMAX_STEPS)
Training outputs:
- Loss curves →
training_loss_plot.png - Generated samples printed at evaluation intervals
Lumenspark2 has a built-in .generate() method supporting top-k, top-p, temperature, and repetition penalty.
from lumenspark_model import LumensparkModel, LumensparkConfig
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
config = LumensparkConfig()
model = LumensparkModel(config, tokenizer=tokenizer)
prompt = "The year is 2050, and humans have colonized Mars."
print(model.generate(prompt, max_length=64, top_p=0.9, temperature=0.7))from utils import count_parameters
count_parameters(model)Outputs total, trainable, and non-trainable parameter counts.
lumenspark2/
├── train.py # Training loop with Hugging Face Trainer
├── lumenspark_model.py # Transformer architecture, config, generate()
├── utils.py # Helper functions (collator, parameter counting)
├── requirements.txt # Dependencies
├── README.md # Documentation
└── LICENSE # MIT License
MIT License – see LICENSE.
- Hugging Face
transformers&datasets - FineWeb-Edu dataset
- OpenAI GPT-2 tokenizer