Skip to content

fix: complete Falcon model support and add Pile dataset#450

Open
Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Medhatt21:fix/falcon-model-and-pile-dataset
Open

fix: complete Falcon model support and add Pile dataset#450
Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Medhatt21:fix/falcon-model-and-pile-dataset

Conversation

@Medhatt21
Copy link

@Medhatt21 Medhatt21 commented Mar 3, 2026

Summary

  • Falcon model: The existing Falcon class in llmc/models/falcon.py was non-functional due to several issues. This PR fixes them so Falcon models (both old RWForCausalLM and new FalconForCausalLM architectures) work correctly with all quantization algorithms.
  • Pile dataset: Adds pile as a supported calibration and evaluation dataset, loading from mit-han-lab/pile-val-backup. Several quantization papers (SmoothQuant, LLM.int8()) use Pile for calibration, but it was not available as a dataset option.

Falcon Fixes

Issue Before After
skip_layer_name() Missing (abstract) — caused instantiation failure Returns ['lm_head']
find_embed_layers() self.model.model.rotary_emb (wrong path) self.model.transformer.rotary_emb
get_layers_except_blocks() Missing lm_head Includes lm_head
has_bias() Hardcoded False Reads from model_config.bias
Architecture detection block.config.architectures[0] string comparison (fragile, could raise on unknown arch) model_config.new_decoder_architecture with getattr fallback
Old-arch layernorms (non-parallel) Only returned post_attention_layernorm Returns both input_layernorm and post_attention_layernorm

Pile Dataset

  • base_dataset.py: Added 'pile' to field map and build_calib_dataset() (loads mit-han-lab/pile-val-backup validation split)
  • specified_preproc.py: Added pile_gptq preprocessor (same pattern as wikitext2_gptq)
  • eval_base.py: Added 'pile' to supported dataset list, download logic, and tokenization
  • Also fixed a minor bug in eval_base.py error message: self.datasetself.eval_dataset_name

Test plan

  • Verify tiiuae/falcon-7b (old arch, parallel_attn=True) loads and quantizes with RTN/GPTQ
  • Verify tiiuae/falcon-40b (new arch, new_decoder_architecture=True) loads and quantizes
  • Verify pile calibration dataset loads and preprocesses correctly
  • Verify pile evaluation dataset loads and computes perplexity

Made with Cursor

Falcon model:
- Add missing skip_layer_name() (was abstract, caused instantiation failure)
- Fix rotary_emb path: model.model.rotary_emb -> model.transformer.rotary_emb
- Add lm_head to get_layers_except_blocks()
- Read has_bias() from model config instead of hardcoding False
- Use model_config.new_decoder_architecture instead of fragile architectures[0] check
- Return both layernorms for old arch non-parallel_attn case
- Use getattr with defaults for safer config attribute access

Pile dataset:
- Add 'pile' as calibration dataset (loads mit-han-lab/pile-val-backup)
- Add pile_gptq preprocessor for GPTQ-style calibration sampling
- Add 'pile' to eval_base supported datasets with data loading and encoding
- Fix eval_base error message: self.dataset -> self.eval_dataset_name

Tested with tiiuae/falcon-7b (old arch) and tiiuae/falcon-40b (new arch).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant