fix: complete Falcon model support and add Pile dataset#450
Open
Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Open
fix: complete Falcon model support and add Pile dataset#450Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Medhatt21 wants to merge 1 commit intoModelTC:mainfrom
Conversation
Falcon model: - Add missing skip_layer_name() (was abstract, caused instantiation failure) - Fix rotary_emb path: model.model.rotary_emb -> model.transformer.rotary_emb - Add lm_head to get_layers_except_blocks() - Read has_bias() from model config instead of hardcoding False - Use model_config.new_decoder_architecture instead of fragile architectures[0] check - Return both layernorms for old arch non-parallel_attn case - Use getattr with defaults for safer config attribute access Pile dataset: - Add 'pile' as calibration dataset (loads mit-han-lab/pile-val-backup) - Add pile_gptq preprocessor for GPTQ-style calibration sampling - Add 'pile' to eval_base supported datasets with data loading and encoding - Fix eval_base error message: self.dataset -> self.eval_dataset_name Tested with tiiuae/falcon-7b (old arch) and tiiuae/falcon-40b (new arch). Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Falconclass inllmc/models/falcon.pywas non-functional due to several issues. This PR fixes them so Falcon models (both oldRWForCausalLMand newFalconForCausalLMarchitectures) work correctly with all quantization algorithms.pileas a supported calibration and evaluation dataset, loading frommit-han-lab/pile-val-backup. Several quantization papers (SmoothQuant, LLM.int8()) use Pile for calibration, but it was not available as a dataset option.Falcon Fixes
skip_layer_name()['lm_head']find_embed_layers()self.model.model.rotary_emb(wrong path)self.model.transformer.rotary_embget_layers_except_blocks()lm_headlm_headhas_bias()Falsemodel_config.biasblock.config.architectures[0]string comparison (fragile, could raise on unknown arch)model_config.new_decoder_architecturewithgetattrfallbackpost_attention_layernorminput_layernormandpost_attention_layernormPile Dataset
base_dataset.py: Added'pile'to field map andbuild_calib_dataset()(loadsmit-han-lab/pile-val-backupvalidation split)specified_preproc.py: Addedpile_gptqpreprocessor (same pattern aswikitext2_gptq)eval_base.py: Added'pile'to supported dataset list, download logic, and tokenizationeval_base.pyerror message:self.dataset→self.eval_dataset_nameTest plan
tiiuae/falcon-7b(old arch,parallel_attn=True) loads and quantizes with RTN/GPTQtiiuae/falcon-40b(new arch,new_decoder_architecture=True) loads and quantizespilecalibration dataset loads and preprocesses correctlypileevaluation dataset loads and computes perplexityMade with Cursor