Develop #7

thomwolf · 2018-11-07T22:34:18Z

Fixing run_squad.py pre-processing bug.

Various clean-ups:

the weight initialization was not optimal (tf. truncated_normal_initializer(stddev=0.02) was translated in weight.data.normal_(0.02) instead of weight.data.normal_(mean=0.0, std=0.02) which likely affected the performance of run_classifer.py also.
gradient accumulation loss was not averaged over the accumulation steps which would have required to change the hyper-parameters for using accumulation.
the evaluation was not done with torch.no_grad() and thus sub-optimal in terms of speed/memory.

…- no_grad on evaluation

Develop

* Initial commit to get BERT + run_glue.py on TPU * Add README section for TPU and address comments. * Cleanup TPU bits from run_glue.py (#3) TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * Cleanup TPU bits from run_glue.py TPU runner is currently implemented in: https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py. We plan to upstream this directly into `huggingface/transformers` (either `master` or `tpu`) branch once it's been more thoroughly tested. * No need to call `xm.mark_step()` explicitly (#4) Since for gradient accumulation we're accumulating on batches from `ParallelLoader` instance which on next() marks the step itself. * Resolve R/W conflicts from multiprocessing (#5) * Add XLNet in list of models for `run_glue_tpu.py` (#6) * Add RoBERTa to list of models in TPU GLUE (#7) * Add RoBERTa and DistilBert to list of models in TPU GLUE (#8) * Use barriers to reduce duplicate work/resources (#9) * Shard eval dataset and aggregate eval metrics (#10) * Shard eval dataset and aggregate eval metrics Also, instead of calling `eval_loss.item()` every time do summation with tensors on device. * Change defaultdict to float * Reduce the pred, label tensors instead of metrics As brought up during review some metrics like f1 cannot be aggregated via averaging. GLUE task metrics depends largely on the dataset, so instead we sync the prediction and label tensors so that the metrics can be computed accurately on those instead. * Only use tb_writer from master (#11) * Apply huggingface black code formatting * Style * Remove `--do_lower_case` as example uses cased * Add option to specify tensorboard logdir This is needed for our testing framework which checks regressions against key metrics writtern by the summary writer. * Using configuration for `xla_device` * Prefix TPU specific comments. * num_cores clarification and namespace eval metrics * Cache features file under `args.cache_dir` Instead of under `args.data_dir`. This is needed as our test infra uses data_dir with a read-only filesystem. * Rename `run_glue_tpu` to `run_tpu_glue` Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

Updating GPT2-TF2 Scripts

goog

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (#5) * Pr fixes (#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (#8) * tokenizer test * format fix * Adding Docs and other minor changes (#7) * Add modeling tests (#9) * Smol Fix (#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (#14) * Update chat templates to use the new API (#15) --------- Co-authored-by: ahmetustun <ahmetustun89@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (huggingface#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (huggingface#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (huggingface#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (huggingface#6) * update gptqmodel version * update gptqmodel version * fix unit test (huggingface#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (huggingface#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (huggingface#10) * review: update docs (huggingface#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (huggingface#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Resolve vptq conflict * Rename spqr package to spqr_quant * Get rid of aqlm mention * Start working on tests * Resolve ruff code checks * Ruff format * Isort * Test updates * Add gpu tag * Rename to modules_to_not_convert * Config update * Docs and config update * Docs and config update * Update to update_torch_dtype * spqr config parameter validation * Ruff update * Apply ruff fixes * Test fixes * Ruff update * Mark tests as @slow again; Ruff; Docstring update * Ruff * Remove absolute path * Resolve typo * Remove redundandt log * Check accelerate/spqr availability * Ruff fix * Check if the config contains proper shapes * Ruff test * Documentation update * overview update * Ruff checks * Ruff code quality * Make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update spqr.md * Enable gptqmodel (#35012) * gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix : Nemotron Processor in GGUF conversion (#35708) * fixing nemotron processor * make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add missing TOC to doc --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * `UniversalSpeculativeDecodingGenerator` * Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * add `TestGenerateWithDifferentModels` * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * `UniversalSpeculativeDecodingGenerator` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * fix device issue * fix get_assistant_input_ids * add `TestAssistedCandidateGeneratorDifferentTokenizers` * formatting * `AssistantVocabTranslatorCache` refactor & tests * revert changes in `src/transformers/generation/logits_process.py` * refactor `AssistedCandidateGenerator` * refactor `AssistedCandidateGeneratorDifferentTokenizers` * formatting * refactor `UniversalSpeculativeDecodingGenerator` * fix negative value for max_new_tokens * fix generation length target + attention_mask vs. assistant + attent * fix device * fix negative max_new_tokens bug * fix UAG * minor * formatting * `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init * resolve conflict & formatting * rerun CI tests * remove space... * remove old code * fix candidate_input_ids device * minor * formatting * Fix prepare + apply (#7) * fix prepare + apply * move to cpu * simplity suppress_tokens * fix bugs and refacatoring * device move * handle self.config.vocab_size > len(target_tokenizer.get_vocab()) * no need to normalize in candidate_generator * address Nadav's comments + minor * optimize device move + SuppressTokensLogitsProcessor * AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements * padding size * padding improvement * fix and simplify get_target_logits * renaming in get_target_logits * minor * add filter_value and suppress_tokens_id * style + rename * remove TODO * restore original SelectTokensLogitsProcessor with modification * fix style * fix _update_past_and_masks and optimize code * remove assistant_vocab_size arg * fix attention_mask * call _prepare_attention_mask also if not has_past_key_values * handling attention mask for first generation * comment * restore test * remove SelectTokensLogitsProcessor * _update_past_and_masks implementation for USD * Add unittests for Universal Assisted generation * fix style * update tests * Remove unused import and fix `test_speculation_depth` test * exclude special and reserved tokens from tokenizer for UAG * mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py` * Remove unused imports and fix style using `make style` (#9) * formatting * Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10) * Fix space sign disagreement (#12) * default values for AssistantToTargetTranslator fileds * fix space sign * minor * fix test + style * Default values for some fields of assistant to target translator (#11) * default values for AssistantToTargetTranslator fileds * fix * add support to empty logit_processors * Update candidate_generator.py (#15) fix typo * BUG fix in _prepare_assistant_input_ids (#14) * fix _prepare_assistant_input_ids * target_to_assistant_input_ids * Update src/transformers/generation/candidate_generator.py Co-authored-by: Nadav Timor <nadav.timor@weizmann.ac.il> --------- Co-authored-by: Nadav Timor <nadav.timor@weizmann.ac.il> * typo (`target_to_assistant_input_ids`) * formatting * merge upstream/main * Fix minor review comments (#16) * Fix: `token_ids.to(torch.int64)` (#18) * tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers) * `LongTensor` * fix dtype * `assistant_input_ids.to(dtype=torch.long)` * Remove unused import from test_candidate_generator.py * Remove unused import from test_candidate_generator.py * Remove `numpy` import * resolve pr comments (#19) * `AssistantToTargetTranslator` docstring * (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants * update `AssistantToTargetTranslator` docstring * (gante's comment) replace `match-case` * formatting * Fix Joao's comments (#21) * remove threading * fix logits_processor * fix test device * fix style (#23) * Move atm (#24) * move AssistantToTargetTranslator * fixup * fix logit_processor * add atm_translator test * refactor test * remove threading from test * add require_torch in tests * move AssistantVocabTranslatorCache + add tests * ruff fix --------- Co-authored-by: jmamou <jonathan.mamou@intel.com> Co-authored-by: Gaurav <gauravj@d-matrix.ai> Co-authored-by: Gaurav Jain <gaurjain14@gmail.com> Co-authored-by: gauravjain14 <41287729+gauravjain14@users.noreply.github.com>

* fix prepare + apply * move to cpu * simplity suppress_tokens * fix bugs and refacatoring * device move * handle self.config.vocab_size > len(target_tokenizer.get_vocab()) * no need to normalize in candidate_generator * address Nadav's comments + minor * optimize device move + SuppressTokensLogitsProcessor * AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements * padding size * padding improvement * fix and simplify get_target_logits * renaming in get_target_logits * minor * add filter_value and suppress_tokens_id * style + rename * remove TODO * restore original SelectTokensLogitsProcessor with modification * fix style * fix _update_past_and_masks and optimize code * remove assistant_vocab_size arg * fix attention_mask * call _prepare_attention_mask also if not has_past_key_values * handling attention mask for first generation * comment * restore test * remove SelectTokensLogitsProcessor * _update_past_and_masks implementation for USD

Add chat template to tokenizer

* 128 experts * Use default rope * Unfuse mlp * Address feedback * Use None "default" for rope_scaling. Add eot. * Meta/llama quant compat (#7) * add quant compatible model & conversion code for llama4 * fix a few issues * fix a few issues * minor type mapping fix --------- Co-authored-by: Lu Fang <fanglu@fb.com> * use a new config parameter to determine which model definition to use for MoE --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Lu Fang <fanglu@fb.com>

* remove one of the last deps * update fast image processor after refactor * styling * more quality of life improvements * nit * update * cleanups * some cleanups * vllm updates * update fake image token * [convert] Fix typo * [convert] Strip extraneous bytes from shards * [convert] Minor fixes * [convert] Use num_experts * multi-image fixes in modeling + processor * fixup size * 128 experts * Use default rope * Unfuse mlp * simplify a lot inputs embeds merging * remove .item() 👀 * fix from review * Address feedback * Use None "default" for rope_scaling. Add eot. * set seed * return aspect ratios and bug fixes * Moe 128 rebased (#8) * 128 experts * Use default rope * Unfuse mlp * Address feedback * Use None "default" for rope_scaling. Add eot. * Meta/llama quant compat (#7) * add quant compatible model & conversion code for llama4 * fix a few issues * fix a few issues * minor type mapping fix --------- Co-authored-by: Lu Fang <fanglu@fb.com> * use a new config parameter to determine which model definition to use for MoE --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Lu Fang <fanglu@fb.com> * un-comment write_tokenizer from converting script * remove un-used imports * [llama4] Pop aspect_ratios from image processor output in Llama4Processor Signed-off-by: Jon Swenson <jmswen@gmail.com> * Fix parameter_count name * Update src/transformers/models/llama4/configuration_llama4.py * nit * Add changes for no_rope, moe_layers, chunked attention. Just need to test all * Update src/transformers/models/llama4/image_processing_llama4_fast.py * nit * fix post merge with main * support flex attention * fixes * fix * add layer * small updates * rebase and delete llm_compressor * nit * [llama4/mm] Add back <|image|> token that delimits global tile * [llama4/mm] Fix Llama 4 image processing unit tests * add explicit dtype Signed-off-by: Jon Swenson <jmswen@gmail.com> * sdpa works * comment todo small * fix model loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * revert * nits * small fix for TP on 1 node * Read new params from config * Add <|eom|> * lol don't know how this got here * adding fp8 * Save processor, fix chat template * style * Add boi/eoi tokens We don't use them. * fixes for now flex seems to work :) * updates * nits * updates * missking keys * add context parallel * update * update * fix * nits * add worldsize and make eager attn work for vision * Ignore new key present in base models * add tp_plan * fix nope Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * minor fix Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * Clean up Llama4 vision model * current updates * add support for `attn_temperature_tuning` * add floor scale * add missing attn scales * push what works, dirty trick for the device synch * oups * Fix pad_token_id See https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files Confirmed in the original codebase. * fix causallml loading * rm * fix tied-weights * fix sdpa * push current version * should work with both short and long * add compressed_tensos & fix fbgemm tp * Fix flex impl * style * chunking * try to revert the potentially breaking change * fix auto factory * fix shapes in general * rm processing * commit cache utils cleanup * Fix context length * fix * allocate * update tp_plan * fix SDPA! * Add support for sparse `Llama4TextMoe` layer from the kernel hub * cleanup * better merge * update * still broken fixing now * nits * revert print * Write max_position_embeddings and max_model_length * Update modeling_llama4.py * Save attention_chunk_size * Sync eos terminators * Read initializer_range * style * remove `dict` * fix * eager should use `chunked_attention_mask` * revert * fixup * fix config * Revert "Merge pull request #36 from huggingface/sparse-llama4-moe" This reverts commit ccda19f, reversing changes made to a515579. * Fix typo and remove warning with compiled flex and chunked prefill * Fix MoE vs FF (#41) * fix * Use correct no_rope_layers if provided one is empty list * update tests * fix * skipping some tests * fix fp8 loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * fix text geneartion pipeline Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * eager needs 4D mask * fix * Some cleanup * fix * update * fix * replace correctly module * patch * modulelist * update * update * clean up * Don't move to `cuda:0` in distributed mode * restrict to compressed tensors for now * rm print * Docs! * Fixes * Update docs/source/en/model_doc/llama4.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fixes * cuda graph fix * revert some stuff * fixup * styling * Update src/transformers/models/llama4/modeling_llama4.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * commit licence, cleanup here and there and style * more styling changes * fix dummies * fix and clean docstrings * remove comment * remove warning * Only fast image processor is supported * nit * trigger CI * fix issue with flex encoder * fix dynamic cache * Code quality * Code quality * fix more tests for now * Code quality * Code quality * Nuke bunch of failing stuff * Code quality * Code quality * cleanup removal of slow image processor * ruff fix fast image processor * fix * fix styling * Docs * Repo consistency * Repo consistency * fix sliding window issue * separate llama cache * styling * Repo consistency * Repo consistency * push waht works * L4 Repo consistency * Docs * fix last last alst alst alst alstsaltlsltlaslt --------- Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: Jon Swenson <jmswen@gmail.com> Co-authored-by: jmswen <jmswen@users.noreply.github.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> Co-authored-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* remove one of the last deps * update fast image processor after refactor * styling * more quality of life improvements * nit * update * cleanups * some cleanups * vllm updates * update fake image token * [convert] Fix typo * [convert] Strip extraneous bytes from shards * [convert] Minor fixes * [convert] Use num_experts * multi-image fixes in modeling + processor * fixup size * 128 experts * Use default rope * Unfuse mlp * simplify a lot inputs embeds merging * remove .item() 👀 * fix from review * Address feedback * Use None "default" for rope_scaling. Add eot. * set seed * return aspect ratios and bug fixes * Moe 128 rebased (huggingface#8) * 128 experts * Use default rope * Unfuse mlp * Address feedback * Use None "default" for rope_scaling. Add eot. * Meta/llama quant compat (huggingface#7) * add quant compatible model & conversion code for llama4 * fix a few issues * fix a few issues * minor type mapping fix --------- Co-authored-by: Lu Fang <fanglu@fb.com> * use a new config parameter to determine which model definition to use for MoE --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Lu Fang <fanglu@fb.com> * un-comment write_tokenizer from converting script * remove un-used imports * [llama4] Pop aspect_ratios from image processor output in Llama4Processor Signed-off-by: Jon Swenson <jmswen@gmail.com> * Fix parameter_count name * Update src/transformers/models/llama4/configuration_llama4.py * nit * Add changes for no_rope, moe_layers, chunked attention. Just need to test all * Update src/transformers/models/llama4/image_processing_llama4_fast.py * nit * fix post merge with main * support flex attention * fixes * fix * add layer * small updates * rebase and delete llm_compressor * nit * [llama4/mm] Add back <|image|> token that delimits global tile * [llama4/mm] Fix Llama 4 image processing unit tests * add explicit dtype Signed-off-by: Jon Swenson <jmswen@gmail.com> * sdpa works * comment todo small * fix model loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * revert * nits * small fix for TP on 1 node * Read new params from config * Add <|eom|> * lol don't know how this got here * adding fp8 * Save processor, fix chat template * style * Add boi/eoi tokens We don't use them. * fixes for now flex seems to work :) * updates * nits * updates * missking keys * add context parallel * update * update * fix * nits * add worldsize and make eager attn work for vision * Ignore new key present in base models * add tp_plan * fix nope Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * minor fix Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * Clean up Llama4 vision model * current updates * add support for `attn_temperature_tuning` * add floor scale * add missing attn scales * push what works, dirty trick for the device synch * oups * Fix pad_token_id See https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files Confirmed in the original codebase. * fix causallml loading * rm * fix tied-weights * fix sdpa * push current version * should work with both short and long * add compressed_tensos & fix fbgemm tp * Fix flex impl * style * chunking * try to revert the potentially breaking change * fix auto factory * fix shapes in general * rm processing * commit cache utils cleanup * Fix context length * fix * allocate * update tp_plan * fix SDPA! * Add support for sparse `Llama4TextMoe` layer from the kernel hub * cleanup * better merge * update * still broken fixing now * nits * revert print * Write max_position_embeddings and max_model_length * Update modeling_llama4.py * Save attention_chunk_size * Sync eos terminators * Read initializer_range * style * remove `dict` * fix * eager should use `chunked_attention_mask` * revert * fixup * fix config * Revert "Merge pull request huggingface#36 from huggingface/sparse-llama4-moe" This reverts commit ccda19f, reversing changes made to a515579. * Fix typo and remove warning with compiled flex and chunked prefill * Fix MoE vs FF (huggingface#41) * fix * Use correct no_rope_layers if provided one is empty list * update tests * fix * skipping some tests * fix fp8 loading Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * fix text geneartion pipeline Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> * eager needs 4D mask * fix * Some cleanup * fix * update * fix * replace correctly module * patch * modulelist * update * update * clean up * Don't move to `cuda:0` in distributed mode * restrict to compressed tensors for now * rm print * Docs! * Fixes * Update docs/source/en/model_doc/llama4.md Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fixes * cuda graph fix * revert some stuff * fixup * styling * Update src/transformers/models/llama4/modeling_llama4.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * commit licence, cleanup here and there and style * more styling changes * fix dummies * fix and clean docstrings * remove comment * remove warning * Only fast image processor is supported * nit * trigger CI * fix issue with flex encoder * fix dynamic cache * Code quality * Code quality * fix more tests for now * Code quality * Code quality * Nuke bunch of failing stuff * Code quality * Code quality * cleanup removal of slow image processor * ruff fix fast image processor * fix * fix styling * Docs * Repo consistency * Repo consistency * fix sliding window issue * separate llama cache * styling * Repo consistency * Repo consistency * push waht works * L4 Repo consistency * Docs * fix last last alst alst alst alstsaltlsltlaslt --------- Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Keyun Tong <tongkeyun@gmail.com> Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Zijing Liu <liuzijing2014@gmail.com> Co-authored-by: Jon Swenson <jmswen@gmail.com> Co-authored-by: jmswen <jmswen@users.noreply.github.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com> Co-authored-by: Yong Hoon Shin <yhshin@meta.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: drisspg <drisspguessous@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Create push-important-models.yml * feat: add falcon-h1 * fixup * address comment * fix * fix copies * fix copies * fix * fix * fix * fix * fix copies * fix * fix copies * fix test import to at least trigget the cis * yups * update * fix make fix copies * fix inits? * fix style * skip annoying test * add integration test for Falcon H1 * fix copies * fix * fix typo * make style * fix slow path generations * clean debug traces * debug * remove debug traces final confirmation * clean debug traces final * fix format and lineup * make style * debug * Update src/transformers/models/falcon_h1/modular_falcon_h1.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * adress comments * fix fix-copies * fix integration test * Merge pull request #7 from ydshieh/fix-slow-path update * another update (#8) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: younesbelkada <younes.belkada@tii.ae> Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Create push-important-models.yml * feat: add falcon-h1 * fixup * address comment * fix * fix copies * fix copies * fix * fix * fix * fix * fix copies * fix * fix copies * fix test import to at least trigget the cis * yups * update * fix make fix copies * fix inits? * fix style * skip annoying test * add integration test for Falcon H1 * fix copies * fix * fix typo * make style * fix slow path generations * clean debug traces * debug * remove debug traces final confirmation * clean debug traces final * fix format and lineup * make style * debug * Update src/transformers/models/falcon_h1/modular_falcon_h1.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * adress comments * fix fix-copies * fix integration test * Merge pull request huggingface#7 from ydshieh/fix-slow-path update * another update (huggingface#8) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> --------- Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: younesbelkada <younes.belkada@tii.ae> Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (huggingface#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (huggingface#3) * Initial Gemm3nTextModel (huggingface#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (huggingface#8) * Refactoring to a single QVK Norm (huggingface#13) * AltUp: support scale_corrected_output (huggingface#14) * Converts einsums to nn.Linear (huggingface#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (huggingface#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (huggingface#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating attention mask for Gemma 3.5 (huggingface#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (huggingface#16) * Add Gemma3n Audio Encoder (huggingface#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (huggingface#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (huggingface#3) * Initial Gemm3nTextModel (huggingface#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (huggingface#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (huggingface#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Vectorize the loops in AltUp (huggingface#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (huggingface#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (huggingface#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Text + Vision Part 2 (huggingface#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Audio Integration (huggingface#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (huggingface#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (huggingface#3) * Initial Gemma3nTextModel (huggingface#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: pculliton <phillipculliton@gmail.com> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (huggingface#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (huggingface#25) * Remove in-place operations (huggingface#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (huggingface#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (huggingface#28) Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * parameterize tests --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Converts einsums to nn.Linear * Removing unused variables

* Gemma 3n * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3p5RMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3) * Adding gemma3p5 text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * regenerating modeling file after syncing to HEAD * Use torch.std(..., unbiased=False) for activation sparsity (#8) * Refactoring to a single QVK Norm (#13) * AltUp: support scale_corrected_output (#14) * Converts einsums to nn.Linear (#7) * Converts einsums to nn.Linear * Removing unused variables * Aligning SharedKVCache with HybridCache (#11) * Alinging SharedKVStore with HybridCache * Remove KVStore. Refactor apply_rotary_pos_emb for sharing * Addressing review comments * Supporting split modality embeddings in Gemma3n (#10) * Adding the Embedder class * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation * Apply suggestions from code review Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Update modular Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * Addressing review comments, prop drilling audio and vision configs to the text config * Removing TODO's that have been addressed * Simplify Embedder init and add audio embeddings * Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder * Refactoring vision and audio embeddings into ConditionalGeneration model --------- Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating attention mask for Gemma 3.5 (#15) * xxx_token_index to xxx_token_id * remvoing deprecated last_cache_position * Removing references to SigLIP * Always init per-layer inputs * Using torch.finfo().min for epsilon_tensor * Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas * fix modular GEMMA3N_INPUTS_DOCSTRING * Gemma3nAttention inherits from Gemma3Attention * Modular inheritance fixes * CausalLM conversion script for 4B model (#16) * Add Gemma3n Audio Encoder (#6) * initial commit of Gemma 3.5 scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma3n overall and text config with vision and audio config placeholders (#3) * Adding gemma3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3.5 (#3) * Initial Gemm3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3.5 * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right Gemma 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3.5 * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * CausalLM conversion script for 4B model * inv_timescales to non-persistent buffer * Addressing audio encoder Attention feedback * Addressing Gemma3nAudioSSCPConvBlock feedback * Addressing Gemma3nAudioConformerAttention feedback * Addressing padding feedback * Weights conversion loads audio state dict * Always use vision_config so saving works * Token id updates for configs * Stubs for interleaving audio embs * Addressing reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Fixing cache access error * Removing duplicate code from a bad merge * Gemma 3n Text + Vision Part 1 (#17) * testing utilities for numerics comparisons * Corrected einsum to nn.Linear weights conversion * Inherit scaled word embs from Gemma3 not Bart * Fixing transposes for collapsed linears * More transpose fixes * numpy api fix * RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True * Force AltUp to float32 * Updating debugging script for AudioEncoder debugging * Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs * Correcting attention einsum conversions * RMSNorm in type of x * Fixing douplicate laurel norm/gating * KV sharing using the right previous indices * Refactor kv shared index computation. Correct frac_shared_layers * Use num_shared_layers instead of inferring from a fraction * fixing a bug for logging * Fix shared data_ptrs in altup inits * rope: adjust proj -> norm -> rope to preserve computation (#20) * rope: adjust proj -> norm -> rope to preserve computation * Removing some breaking language model fluff in ConditionalGeneration * Consolidate query_states transforms --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Vectorize the loops in AltUp (#19) * Vectorize the loops in AltUp * fix typo * Expanding to support batched inputs * remove extra debug script * Fix AltUp.forward --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel * Convert norm to 1/sqrt (#21) * Convert norm to 1/sqrt * Scale shift change per Phil's rec * Adding default activation sparsity * Fixing 2B config in weights conversion script * Fixing RMSNorm parameters - adding scale_shift and with_scale * Correcting query pre-attention scaling * Adding query_rescale_scalar to text config * Adding layer_idx to MLP * Permafix for input_layernorm * Use 1/sqrt instead of rsqrt in DecoderLayer * Fix o_proj conversion * Conversion script update for vision encoder * Removing logging for debugging timm model * Fixing bugs in Gemma3nForConditionalGeneration for text generation * Generating the modeling_gemma3n.py file * Removing the addition of an erroneous line in the modeling file * Adding gemma3n text model to modeling_auto * Bugfix: Updating the interleaving of inputs_embeds and vision_embeds * Updating the modeling file with the latest bugfix changes * Updating models/auto for Gemma 3n * using AutoTokenizer in forward test * Adding processing_gemma3n.py * Gemma 3n configured for AutoModel. Conversion script updated. * Removing errant merge artifacts --------- Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> * Removing errant debugging statements from Gemma 3 * Gemma3n audio model (#18) * testing utilities for numerics comparisons * Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock * Add audio version of forward script based on RyanMullins' implementation * Updating to match encoder tests. WIP: config question needs resolving * Updates to audio classes to enable end-to-end running * Removing vestigial classes, cleaning up print statements * Adding SiLU / Swish to audio conformer feed forward block * Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio * Adding outputs to audio test * Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model * Update forward test to load from local weights * Update conversion to process / output audio layers * Update __all__ to export audio encoder * AutoModel registration for Gemma 3n Audio * Use AutoModel for ConditionalGeneration.audio_tower * Fixing input_proj_linear transpose * Fixing Gemma3NanoAudioConformerAttention.post conversion * Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion * Correcting indentation issue on Gemma3p5RMSNorm --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Text + Vision Part 2 (#23) * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3p5.py * Update src/transformers/models/gemma3p5/modular_gemma3p5.py Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Updating configs for the 2B variant in the conversion script * Using final generation config in conversion script --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Audio Integration (#12) * initial commit of Gemma 3n scaffold * Fixing param pass through on Gemm3nRMSNorm * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Adds AltUp to Gemma 3n * Adding Gemma 3n overall and text config with vision and audio config placeholders (#3) * Adding Gemma 3n text configs * Adding audio config placeholders * Adding a placeholder for vision configs * Updating MobileNetVisionConfig, inheriting TimmWrapperConfig * Updating text configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Removing altup configs to accept the suggested configs * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating altup config * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Addressing review comments and updating text configs * Adding a config for activation sparsity * Updating configs to pass through options to super class init and adjust some name prefixes * Updating laurel and altup with corrected config values * Normalizing sub_config initializers --------- Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Updating MLP with activation sparsity (#2) * Updating DecoderBlock for Gemma 3n (#3) * Initial Gemma3nTextModel (#4) NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference. * Adding KV Cache Sharing * Adds Einsum layer to Gemma 3n * Updating EinsumLayer API * Refactored kv cache sharing in attention * Adding KVStore for cache sharing * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update modular Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Update src/transformers/cache_utils.py Co-authored-by: Ryan Mullins <ryanmullins@google.com> * Undoing erroneous force push * Reverting RMSNorm to with_scale by default * Adds LAuReL to Gemma 3n * Updating KV Cache Sharing implementation * Updating the q and k norm definitions in the attention module * Fixing name error for q,k,v RMS norm to use the right 3n module * Updating MLP with activation sparsity * Updating DecoderBlock for Gemma 3n * Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code * Isolating KV Cache logic to relevant components * Fixing logic error in Gemma3nAttention.forward * Refactoring caching contributions and fixing kv_store initialization * Simplifying Configs * Remove errant self from super init call * Bug fix in the Attention module - changing self.head_dim to config.head_dim * Bug fixes in the LaurelBlock and RMS Norm super init call * removing redundant code from a merge * Adding per_layer_inputs to TextModel * Adding preprocess embeddings with altup * Adds per-layer-to-single output and a host of TODOs * Integrating altup predict with the model workflow and other minor bug fixes * Using nn.Embedding temporarily for text model * It goes forward * Minor refactor of attention sparsity and RoPE initialization * Fixing duplicate rope_scaling param bug when loading from pretrained --------- Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Normalizing on altup_num_inputs config option * Adding audio encoder config * Adds high-level components for Audio Encoder * Implement uniform reducer for Audio Encoder * Adding placeholders for Conformer components in Audio Encoder * Adding placeholders for SubSampleConvProjection components in Audio Encoder * Adding SequenceLayer component placeholders * Implementing Gemma3nAudioEncoder with nn.Sequential * Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential * Implementing Conformer model with SequenceLayers * Use OrderedDict in nn.Sequential initializers * Implements sl.Residual in Torch with nn.Sequential and OrderedDict * Adopting a base SequenceLayer class with default forward() method * Implementing sl.GatedLinearUnit in Torch * Implementing sl.Swish in Torch * Implementing sl.ReLU in Torch * Implementing sl.Scale in Torch * Removing sl.Dropout after tree-shaking * Implementing sl.RMSNorm in Torch with fake shape * Implementing sl.GroupNorm in Torch * Implementing sl.Conv2d in Torch * Implementing sl.Dense in Torch * Removing sl.Delay layers, which act as pass-throughs * Connecting shapes to configs in initializers * Removing sl.Emit * Implementing sl.ExpandDims in Torch * Adding sl.GradientClipping to Torch * Implementing sl.DenseShaped in Torch * Implementing sl.LDPA in Torch * Removing unused sl.CombinedQKVProj class * Fixing erroneous type hint * Implemnenting sl.DepthwiseConv1D in Torch * Implementing sl.MaskInvalid in Torch * Fixes for initialization * Fixes for saving weights * Removing einsums per feedback from HF staff * Removing Sequence Layers idioms from audio encoder * Fixes for reviewer comments * Converting sl.Frontend to FeatureExtractor * Updates for ConditionalGeneration.get_image_features * Adding a WIP draft of image_processing_gemma3n.py * Update modular Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> * Modular conversion after github suggested change * Text + image gives good results * Fixing image size preset * Draft of audio data in chat template * Removing image processing. Using SigLIP instead. * Audio input going end-to-end * Fixing dtype issues in audio encoder * x-lib formatting consistency * Adding example data * Save preprocessor_config.json from conversion script * Instrumentaiton for debugging * Additional instrumentation for preprocessing debugging * Updates to preprocessor, padding; produces correct end-to-end results on sample * Tackling configuraiton TODOs * Start of feature extractor refatcor * Adds Numpy version of USM extractor, removes Torch version and dependencies * Fixing AltUp.correct coef permute * Supporting batches of single audio segment inputs * Docstrings updates for config * In-lining audio feature extraction * Adjustments to conversion script and smoke test script --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: pculliton <phillipculliton@gmail.com> * Gemma 3n renaming * Removing test data and utilities * Renaming test files * Gemma 3n refactor * Fix tokenizer config in conversion script * Address reviewer feedback * FeatureExtractor returns float32 by default * Adding basic tests for audio, and input name for audio encoder * Audio integration test, updates to model_id for other integration tests * Use scales for q and k norms (#26) * Update audio integration test to use HF dataset * Reviewer feedback * Expand embedding table to full vocab size in weights conversion * Mix-n-match MatFormers for Gemma 3n (#25) * Remove in-place operations (#30) * chore: removing inplace ops * remove [tensor] * n pattern * chore: reviewer feedback in AudioEncoder and AltUp * More grad clipping * Dynamo compatibility * fix: cache slicing error * chore: simplify shared kv cache slicing * chore: vision encoder rename in timm * fix: image processor do_normalize=False * fixup: style * chore: model_doc * fix: docs for code quality * chore: repo consistency * fix: RMSNorm in float as in prior Gemmas * fix: per_layer_inputs = None * chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint * chore: repo consistency * Add initial unit tests for Gemma3nAudioFeatureExtractor (#27) * Add initial unit tests for Gemma3nAudioFeatureExtractor * Add basic unit tests for Gemma3nProcessor (#28) Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * parameterize tests --------- Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> * chore: code style * fix: test cases * style and consistency * fix config in the test to be coherent with layer cache sharing * fix hidden states in tests and code * inits and mappings * fix modality prefixes * test order and prefixes * fix test exception * fix class order and reduce model size for faster tests * restore _checkpoint_conversion_mapping to load Caual from Conditional * fix config mapping! * fix: reviewer feedback --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix import test * add model args * auto_docstring * replace test path * consistency * skip tests for now * fix docstring for doc builder * skip unused attr --------- Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com> Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Mayank Chaturvedi <imayank@google.com> Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com> Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: pculliton <phillipculliton@gmail.com> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>

feat: add megablocks moe mlp kernel

* merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (#5) * fix attention * use llamamodel * fix code * Fix qk (#6) * fix qk_norm * fix * fix modual * Fix moe (#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (#9) fix moe & gate * Fix gate norm (#10) * add norm_topk_prob * Fix testcase (#11) * fix&skip test * Fix testcase (#12) * skip testcase * Fix norm topk (#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Mingji Han <mingjihan@tencent.com>

* merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (huggingface#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (huggingface#5) * fix attention * use llamamodel * fix code * Fix qk (huggingface#6) * fix qk_norm * fix * fix modual * Fix moe (huggingface#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (huggingface#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (huggingface#9) fix moe & gate * Fix gate norm (huggingface#10) * add norm_topk_prob * Fix testcase (huggingface#11) * fix&skip test * Fix testcase (huggingface#12) * skip testcase * Fix norm topk (huggingface#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Mingji Han <mingjihan@tencent.com>

change user api

* Update notification service MI325 (#40078) add mi325 to amd_daily_ci_workflows * Fix PerceptionLM image preprocessing for non-tiled image input. (#40006) * Fix PerceptionLM image preprocessing for non-tiled image input. * Add test for single tile vanilla image processing. * ruff format * recover missing test skip * Simplify test. * minor test name fix * Revert FA2 kwargs construction (#40029) * revert * use imports * went way too high in imports level * style * [fix] batch inference for llava_onevision (#40021) * [fix] llava onevision batch inference * style * cannot pass inconsistent list & handle text-only case * [docs] Zero Shot Object Detection Task (#40096) * refactor zsod task docs * keeping the image guided od section * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Update docs/source/en/tasks/zero_shot_object_detection.md Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> * Update Glm4V processor and add tests (#39988) * update GLm4V and add tests * Update tests/models/glm4v/test_processor_glm4v.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * remove min/max pixels for BC * fix video tests --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Add glm4.5&&glm4.5V doc (#40095) * Docs: GLM-4-MoE & GLM-4V-MoE pages * Docs: polish GLM-4V-MoE intro, remove placeholders; pin image * Docs --------- Co-authored-by: wujiahan <lambert@gmail.com> * Causal loss for `ForConditionalGeneration` (#39973) * feat: add ForConditionalGeneration loss to LOSS_MAPPING * consistent spelling of "recognized" * Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock (#39743) audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock * New DynamicSlidingWindowLayer & associated Cache (#40039) * start adding the layer * style * improve * modular * fix * fix * improve * generate integration * comment * remove old one * remove * fix * fix * fix * fix all recompiles * fix * doc * fix * add text config check * fix encoderdecoder cache * add it for all models with sliding/hybrid support * revert * start fixing * prophetnet * fsmt * fix ddp_data * add test for mistral * improve mistral test and add gemma2 test * docstrings * Enable SIM rules (#39806) * Enable SIM rules Signed-off-by: cyy <cyyever@outlook.com> * More fixes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com> * feat: add `is_fast` to ImageProcessor (#39603) * feat: add `is_fast` to ImageProcessor * test_image_processing_common.py 업데이트 Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * feat: add missing BaseImageProcessorFast import * fix: `issubclass` for discriminating subclass of BaseImageProcessorFast --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * Re-apply make style (#40106) make style * Replace `logger.warning` with `logger.warning_once` in `GradientCheckpointingLayer` (#40091) * Fix regression in mllama vision encoder (#40083) fix mllama vision encoder Signed-off-by: Isotr0py <2037008807@qq.com> * Switch the order of args in StaticCache (for BC and future logic) (#40100) * switch order for BC and future logic * in generate as well * Fix Qwen3 MoE GGUF architecture mismatch (#39976) * fix qwen3moe gguf architecture * Fix Qwen3Moe GGUF loading --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Jinuk Kim <jusjinuk@snu.ac.kr> * Fix error on importing unavailable torch.distributed (#40038) Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module. * Default to dequantize if cpu in device_map for mxfp4 (#39993) * default to dq if cpu * an other check * style * revert some changes * [`Flash Attention`] Fix flash attention integration (#40002) * fix flash attention * i got a stroke reading that comment * change dropout kwarg back to before * rename _fa3... as it's used for multiple variants and should work as fallback instead * simplify imports and support kwargs for fa * style * fix comments order * small fix * skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart * style * allow fullgraph by preloading on init * make globals "private" * ci pls be happy * change skip conditions based on backend flag (indicating missing mask interface) * move globals support to a function to prepare kwargs * style * generalize supported kwargs * small change to doc * fix * add comments * style * revert prep during generate * style * revert weird style changes * add fa kwarg prep during generate with fixes back * how did this even happen * how * add comment * [trainer] ensure special tokens in model configs are aligned with tokenizer at train time (#38441) * tmp commit * add test * make fixup * reset warns/info in test * Fix Causality Handling in Flash Attention to Support Bidirectional Attention (#39707) Fix the is_causal logic to enable bidirectional attention Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [docs] Add reference to HF-maintained `custom_generate` collections (#39894) decoding -> generation; add collections * Add model card for MobileViT (#40033) * Add model card for MobileViT * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * remove sequence parallel in llama4 (#40084) * 🌐 [i18n-KO] Translated `tiny_agents.md` to Korean (#39913) * docs: ko: tiny_agents.md * feat: nmt draft * fix: manual edits * fix: manual edits * [bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (#39975) * [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models * to cuda * changed xLSTMRMSNorm to RMSNorm (#40113) * changed xLSTMRMS.. to RMS... * fix linter error --------- Co-authored-by: Nikita <nikita@Nikitas-MacBook-Pro.local> * Fix QuantoQuantizedCache import issues (#40109) * fix quantoquantized * [serve] allow array `content` inputs for LLMs (#39829) fix bug; add tests * `decoding_method` argument in generate (#40085) * factor out expand inputs * callable arg * improve docs, add test * Update docs/source/en/generation_strategies.md Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Collated reports (#40080) * Add initial collated reports script and job definition * provide commit hash for this run. Also use hash in generated artifact name. Json formatting * tidy * Add option to upload collated reports to hf hub * Add glob pattern for test report folders * Fix glob * Use machine_type as path filter instead of glob. Include machine_type in collated report * DOCS: Add missing space in SECURITY.md (#40087) * [trainer] handle case where EOS token is None in `generation_config` (#40127) * handle case where EOS token is None in gen config * update eli5 dataset * Fix hidden torchvision>=0.15 dependency issue (#39928) * use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT * fix min torchvision version * use InterpolationMode directly * remove unused is_torchvision_greater_or_equal, * nit * 🌐 [i18n-KO] Translated `main_classes/processors.md` to Korean (#39519) * docs: ko: processors.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> --------- Co-authored-by: TaskerJang <bymyself103@naver.com> Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * 🌐 [i18n-KO] Translated `jamba.md` to Korean (#39890) * docs: ko: jamba.md * feat: nmt draft * fix: manual edits * fix: resolve suggestion Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> --------- Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> * 🌐 [i18n-KO] Translated `main_classes/optimizer_schedules.md` to Korean (#39713) * docs: ko: main_classes/optimizer_schedules * feat: nmt draft * fix: improve TOC anchors and expressions in optimizer_schedules - Add TOC anchors to all section headers - Fix terminology and improve Korean expressions * fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된' Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization. * fix: Use more natural Korean inheritance expression Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology. * fix: Use consistent '미세 조정' translation for 'finetuned models' Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology. * 🚨🚨 [generate] ignore `cache_implementation="hybrid"` hub defaults (#40135) * working? * fix tests * 🌐 [i18n-KO] Translated `gpt2.md` to Korean (#39808) * docs: ko: bamba.md * feat: nmt draft * fix: manual edits * docs: ko: gpt2.md * feat: nmt draft * fix: manual edits * Remove bamba.md from docs/source/ko/model_doc/ * Update _toctree.yml * 🌐 [i18n-KO] Translated `optimizers.md` to Korean (#40011) * docs: ko: optimizers.md * feat: optimizers draft * fix: manual edits * docs: ko: update optimizers.md * Update docs/source/ko/optimizers.md Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> * Update docs/source/ko/optimizers.md Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> * Update docs/source/ko/optimizers.md Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com> * docs: ko: final updates to optimizers and toctree --------- Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com> Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com> * 🌐 [i18n-KO] Translated grounding-dino.md to Korean (#39861) * docs: ko: grounding-dino.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/model_doc/grounding-dino.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> * Update docs/source/ko/model_doc/grounding-dino.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> * Update docs/source/ko/model_doc/grounding-dino.md Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> * docs: add AP explanation for better readability --------- Co-authored-by: TaskerJang <bymyself103@naver.com> Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * 🚨 Use lru_cache for sine pos embeddings MaskFormer (#40007) * use lru_cache for sine pos embeddings maskformer * fix calls to pos embed * change maxsize to 1 * 🌐 [i18n-KO] Translated `pipelines.md` to Korean (#39577) * docs: ko: pipelines.md * feat: gpt draft * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/main_classes/pipelines.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update _toctree.yml * Update _toctree.yml 번역 문서 수정 * Update pipelines.md ToC 수정 * Update pipelines.md --------- Co-authored-by: xhaktm <tnwjd318@hs.ac.kr> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * gpt oss is important (#40139) * Fix Janus (#40140) fix * Add Segment Anything 2 (SAM2) (#32317) * initial comment * test * initial conversion for outline * intermediate commit for configuration * chore:init files for sam2 * adding arbitary undefined config * check * add vision * make style * init sam2 base model * Fix imports * Linting * chore:sam to sam2 classes * Linting * Add sam2 to models.__init__ * chore:match prompt encoder with sam2 code * chore:prepare kwargs for mask decoder * Add image/video predictors * Add CUDA kernel * Add output classes * linting * Add logging info * tmp commit * docs for sam2 * enable image processing * check difference of original SAM2 - difference is the order of ToTensor() - please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize * enable promptencoder of sam2 * fix promprencoder * Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference) * Confirmed that ImageEncoder is exactly same (Be aware the linting of init) * Confirmed that MaskDecoder is exactly same (TO DO: lint variable name) * SamModel is now available (Need more chore for name) * make fix-copies * make style * make CI happy * Refactor VisionEncoder and PostioinEmbedding * TO DO : fix the image_embeddings and sparse_embeddings part * pure image inference done * reusable features fix and make style * styling * refactor memoryattention * tmp * tmp * refactor memoryencoder TO DO : convert and inference the video pipeline * TO DO : fix the image_encoder shape * conversion finish TO DO: need to check video inference * make style * remove video model * lint * change * python utils/check_docstringspy --check_all * python utils/check_config_attributes.py * remove copies for sam2promptencoder due to configuration * change __init__.py * remove tensorflow version * fix that to not use direct comparison * make style * add missing import * fix image_embedding_size * refactor Sam2 Attention * add fully working video inference (refactoring todo) * clarify _prepare_memory_conditioned_features * simplify modeling code, remove unused paths * use one model * use auto_docstring * refactor rope embeddings * nit * not using multimask when several points given * add all sam2.1 * add video tmp * add Sam2VideoSessionState + fast image proc + video proc * remove init_states from model * fix batch inference * add image integration tests * uniformize modeling code with other sam models and use modular * pass vision tests an most model tests * All tests passing * add offloading inference state and video to cpu * fix inference from image embedding and existing mask * fix multi_boxes mask inference * Fix batch images + batch boxes inference * improve processing for image inference * add support for mask generation pipeline * add support for get_connected_components post processing in mask generation * add fast image processor sam, image processor tests and use modular for sam2 image processor * fix mistake in sam after #39120 * fix init weights * refactor convert * add integration tests for video + other improvements * add needed missing docstrings * Improve docstrings and * improve inference speed by avoiding cuda sync * add test * skip test for vision_model * minor fix for vision_model * fix vision_model by adding sam2model and change the torch dependencies * remove patch_size * remove image_embedding_size * fix patch_size * fix test * make style * Separate hieradet and vision encoder in sam2 * fixup * review changes part 1 * remove MemoryEncoderConfig and MemoryAttentionConfig * pass q_stride instead of q_pool module * add inference on streamed videos * explicitely process streamed frames * nit * Improve docstrings in Sam2Model * update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel * improve video inference api * change inference_state to inference_session * use modular for Sam2Model * fix convert sam2 hf * modular * Update src/transformers/models/sam2/video_processing_sam2.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix minor config * fix attention loading error * update modeling tests to use hub checkpoints * Use CI A10 runner for integration tests values + higher tolerance for video integration tests * PR review part 1 * fix doc * nit improvements * enforce one input format for points, labels and boxes * nit * last few nits from PR review * fix style * fix the input type * fix docs * add sam2 model as conversion script * improve sam2 doc * nit fixes + optimization * split sam2 and sam2_video in two models * PR review part 1 * fix None for default slow processor of sam2 * remove unecessary code path in sam2_video * refactor/simplify RoPE * replace embedding module list with embedding matrix * fix tests * remove kernel * nit * use lru_cache for sine_pos_embeddings * reorder sam2_video methods * simplify sam2_video * PR review part 1 * simplify sam2 video a lot * more simplification * update integration tests with updated conftest * more explicit config for hieradet * do post_processing outside of sam2 video model * Improve Sam2VideoVisionRotaryEmbedding * fix tests * update docs and fix mask2former/oneformer * avoid unnecessary reshapes/permute * fix device concatenating points * small dtype fix * PR review * nit * fix style and finish up doc * fix style * fix docstrings * fix modular --------- Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com> Co-authored-by: Haitham Khedr <haithamkhedr@meta.com> Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * [docs] Fix ko toctree (#40138) Update _toctree.yml * Remove an old badly designed test (#40142) remove it * updated visualBERT modelcard (#40057) * updated visualBERT modelcard * fix: Review for VisualBERT card * 🌐 [i18n-KO] Translated `gemma3.md` to Korean (#39865) * docs: ko: gemma3.md * feat: nmt draft * fix: manual edits * fix: resolve suggestions Co-authored-by: Chaewon Song <chaewon1019@ewhain.net> * fix: resolve suggestions --------- Co-authored-by: Chaewon Song <chaewon1019@ewhain.net> * Fix quantized cache with only cache_implementation in generate (#40144) * fix args * comment * Add pytest marker: `torch_compile_test` and `torch_export_test` (#39950) * new marker * trigger CI * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Update Dockerfiles to install packages inside a virtual environment (#39098) * Removed un-necessary virtual environment creation in Dockerfiles. * Updated Dockerfiles to install packages in a virtual environment. * use venv's python * update * build and trigger * trigger * build and trigger * build and trigger * build and trigger * build and trigger * build and trigger * build and trigger * update * update * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Create self-scheduled-amd-mi355-caller.yml (#40134) * [Cohere2Vision] remove unused arg (#40103) * remove unused arg * remove the arg from test as well * [efficientloftr] fix bugs and follow original cross attn implementation strictly (#40141) * fix: changed is_causal to be False * fix: Added original cross attention bug * fix: fixed the way bordel removal is computed * fix: added missing normalization on coarse features * test: fixed integration tests --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Fix CI: Use correct import in SAM for torchvision InterpolationMode (#40160) fix ci * [Continous Batching] set head_dim when config.head_dim is None (#40159) * set head_dim when config.head_dim is None * use model's actual TP setting * Replace `self.tokenizer` by `self.processing_class` (#40119) * [FA2] Fix it finally - revert fa kwargs preparation (#40161) revert * [bugfix] fix flash-attention2 unavailable error for Ascend NPU (#40151) * [bugfix] fix flash-attention2 unavailable error for Ascend NPU * remove redundant apply_rotary_emb usage * fix ruff check error * pad_input and unpad_input use same implementation as fa2 * rollback redundant codes * fix ruff check error * optimize fa2 judgement logic * Fix docs typo (#40167) * DINOv3 model * working version * linter revert * linter revert * linter revert * fix init * remove flex and add convert to hf script * DINOv3 convnext * working version of convnext * adding to auto * Dinov3 -> DINOv3 * PR feedback * complete convert checkpoint * fix assertion * bf16 -> fp32 * add fast image processor * fixup * change conversion script * Use Pixtral attention * minor renaming * simplify intermediates capturing * refactor DINOv3ViTPatchEmbeddings * Refactor DINOv3ViTEmbeddings * [WIP] rope: remove unused params * [WIP] rope: rename period -> inv_freq for consistency * [WIP] rope: move augs * change inv_freq init (not persistent anymore) * [WIP] rope: move coords to init * rope - done! * use default LayerScale * conversion: truncate expected outputs * remove commented code * Refactor MLP layers * nit * clean up config params * nit docs * simplify embeddings * simplify compile compat lru_cache * fixup * dynamic patch coords * move augmentation * Fix docs * fixup and type hints * fix output capturing * fix tests * fixup * fix auto mappings * Add draft docs * fix dtype cast issue * add push to hub * add image processor tests * fixup * add modular * update modular * convert and test convnext * update conversion script * update prefix * Update LayerNorm * refactor DINOv3ConvNextLayer * rename * refactor convnext model * fix doc check * fix docs * fix convnext config * tmp fix for check docstring * remove unused arg * fix tests * (nit) change init * standardize gated MLP * clear namings and sat493m * fix tensors on different devices * revert linter * pr * pr feedbak ruff format * missing headers * fix code snippet and collection link in docs * DINOv3 description * fix checkpoints in tests * not doc fixes in configs * output_hidden_states * x -> features * remove sequential --------- Co-authored-by: Cijo Jose <cijose@meta.com> * build: Add fast image processor tvp (#39529) * build: add TvpImageProcessorFast - Introduced TvpImageProcessorFast to enhance image processing capabilities. - Updated image processing auto registration to include the new fast processor. - Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes. * fix: TvpImageProcessorFast with new resize method and update processing logic * build: add TvpImageProcessorFast * refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests - Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py. - Updated import order in test_image_processing_tvp.py for clarity. - Minor adjustments to maintain code readability and consistency. * fix: Enhance TvpFastImageProcessorKwargs and update documentation - Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast. - Updated the documentation in tvp.md to include the new class and its parameters. - Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing. - Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs. * fix: tested now with python 3.9 * fix: remove tvp kwargs from docs * simplify processing * remove import and fix tests --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> * Add GptOssForSequenceClassification for GPT-OSS models (#40043) * Add GptOssForSequenceClassification * Tiny fix * make fixup * trigger CI rerun * Check config type instead --------- Co-authored-by: Yuefeng Zhan <yuefzh@microsoft.com> * Standardize BARTpho model card: badges, new examples, fixed broken im… (#40051) * Standardize BARTpho model card: badges, new examples, fixed broken image section, and links (#36979)Update bartpho.md * Update bartpho.md Removed non-required/unsupported sections: Quantization, Attention visualizer, and Resources (plus stray tokenizer header). Added code snippets which were suggested * Update bartpho.md Updated with necessary tags * Update bartpho.md * Update bartpho.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Add dates to the model docs (#39320) * added dates to the models with a single hf papers link * added the dates for models with multiple papers * half of no_papers models done * rest of no_papers models also done, only the exceptions left * added copyright disclaimer to sam_hw, cohere, cohere2 + dates * some more fixes, hf links + typo * some new models + a rough script * the script looks robust, changed all paper links to hf * minor change to handle technical reports along with blogs * ran make fixup to remove the white space * refactor * Pin torch to 2.7.1 on CircleCI for now (#40174) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Update dynamic attnt setter for multimodals (#39908) * update * fix the test for DepthPro * PR comments * wait, I didn't delete this in prev commit? * fix * better way --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * [MINOR:TYPO] Update base.py (#40169) * [MINOR:TYPO] Update base.py All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code) Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns: `ValueError: The task does not provide any default models for options ('EN', 'FR')` It might be a good idea to allow for uppercase, but that's for another issue. * [MINOR:TYPO] Update __init__.py * make model doc device agnostic (#40143) * make model doc device agnostic Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update align.md * Update aya_vision.md * Update byt5.md * refine Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update granitevision.md * Update src/transformers/pytorch_utils.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * add doc Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * 3 more Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix to avoid modifying a view in place (#40162) * fix to avoid modifying a view in place * add backward test in tensor parallel * add test to test_modelig_gpt_oss.py * linting * Fix fsdp for generic-task models (#40191) * remove abc inheritance * add fast test * Add repr to EncoderDecoderCache (#40195) * add repr * oups * Fix typos (#40175) Signed-off-by: cyy <cyyever@outlook.com> * Remove _prepare_flash_attention_from_position_ids (#40069) Signed-off-by: cyy <cyyever@outlook.com> * Avoid CUDA stream sync (#40060) Signed-off-by: cyy <cyyever@outlook.com> * Fix various Pylint warnings (#40107) Tidy code Signed-off-by: cyy <cyyever@outlook.com> * Update: add type hints to check_tokenizers.py (#40094) * Update check_tokenizers.py chore(typing): add type hints to check_tokenizers script - Annotate params/returns for helper functions - Keep tokenizer instances as `Any` to avoid runtime coupling - Make `check_LTR_mark` return `bool` explicitly (no behavior change) * Update check_tokenizers.py chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py - Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params - Covers both PreTrainedTokenizer and PreTrainedTokenizerFast - Exposes required methods (encode, decode, encode_plus, tokenize) - Removes generic Any typing while staying implementation-agnostic * Benchmarking improvements (#39768) * Start revamping benchmarking * Start refactoring benchmarking * Use Pandas for CSV * import fix * Remove benchmark files * Remove sample data * Address review comments * Add X-Codec model (#38248) * add working x-codec * nit * fix styling + copies * fix docstring * fix docstring and config attribute * Update args + config * update convertion script * update docs + cleanup * Ruff fix * fix doctrings * Fix GPT-OSS `swiglu_limit` not passed in for MXFP4 (#40197) Add swiglu_limit = 7.0 * docs: Update LayoutLM model card according to new standardized format (#40129) * docs: Update LayoutLM model card with standardized format * Apply suggestions from code review This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments. Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Address remaining review comments * Address few more review comments: 1. remove transformer-cli section 2. put resources after notes 3. change API refs to 2nd level header * Update layoutlm.md * Update layoutlm.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for `too long with no output` (#40201) * Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)" This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5. * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Use correct `model_input_names` for PixtralImageProcessor (#40226) add image_sizes to model_input_names * fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function (#40130) * fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> * fix similar errer at qwen2_vl and do make fix-copies Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> * pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> * Apply style fixes --------- Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * [SAM 2] Change checkpoints in docs and tests (#40213) * change checkpoints in docs and tests * add notebook * Fix more typos (#40212) Signed-off-by: cyy <cyyever@outlook.com> * Fix ESM token_dropout crash when using inputs_embeds instead of input_ids (#40181) * fix: Error after calling ESM model with input embeddings not input ids * propagate changes to other models * AMD scheduled CI ref env file (#40243) * Reference env-file to be used in docker running the CI * Disable MI300 CI for now * Add Ovis2 model and processor implementation (#37088) * Add Ovis2 model and processor implementation * Apply style fixes * Add unit tests for Ovis2 image processing and processor * Refactor image processing functions for clarity and efficiency * Add Ovis2 ImageProcessorFast * Refactor Ovis2 code * Refactor Ovis2 model components and update processor functionality * Fix repo consistency issues for Ovis2: docstring, config cleanup * Update Ovis2 model integration tests * Update Ovis2 configuration and processing classes for improved documentation * Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES * Fix conflict * Fix import order * Update image processor class names * Update Ovis2 model structure * Refactor Ovis2 configuration * Fix typos * Refactor Ovis2 model classes and remove unused code * Fix typos * Refactor Ovis2 model initialization * Fiix typos * Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py * Add license and update type hints * Refactor token function and update docstring handling * Add license * Add Ovis2 model support and update documentation * Refactor Ovis2 model structure and enhance multimodal capabilities * Update Ovis2 weight mapping for consistency and clarity in key patterns * Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently. * Refactor Ovis2 model test structure to include Ovis2Model * Add optional disable_grouping param to Ovis2ImageProcessorFast * Refactor type hints in Ovis2 modules * Add licensing information in Ovis2 modules and tests * Refactor Ovis2 model by removing unused methods * Refactor Ovis2 model tests by renaming test classes and removing skipped tests * Refactor Ovis2 model output classes * Refactor Ovis2 weight conversion and Update model embedding classes * Refactor Ovis2 model imports and remove unused functions * Enhance vision configuration extraction in Ovis2 weight conversion * Refactor Ovis2 model's forward method to remove interpolation option * Update Ovis2 model documentation * Refactor Ovis2 model input handling and tokenizer configuration * Update return type hints in Ovis2 model * Remove commented-out code * fix config for tests and remove key mappings * Update tokenizer configuration to use add_special_tokens method * skip torchscript * Fix image placeholder generation in Ovis2Processor * Refactor Ovis2 model to rename visual_table to visual_embeddings_table * Enhance Ovis2 model by adding vision_feature_select_strategy parameter * Refactor Ovis2 model weights conversion and architecture * Refactor Ovis2 model by removing vision_feature_select_strategy parameter * Update Ovis2 model examples * Refactor Ovis2 model * Update Ovis2 model * Update Ovis2 model configuration * Refactor Ovis2 model test setup * Refactor flash attention support * Refactor * Fix typo * Refactor * Refactor model classes * Update expected output in Ovis2 * Refactor docstrings * Fix * Fix * Fix * Update input in tests * Fix * Fix get_decoder method * Refactor * Refactor Ovis2 * Fix * Fix * Fix test * Add get_placeholder_mask * Refactor Ovis2 model tests * Fix * Refactor * Fix * Fix * Fix Ovis2 test --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Fix more pylint warnings (#40204) Fix pylint warnings Signed-off-by: cyy <cyyever@outlook.com> * 🚨 Always return Cache objects in modelings (to align with generate) (#39765) * watch the world burn * fix models, pipelines * make the error a warning * remove kwargs and return_legacy_cache * fix reformer * remove transpose_for_scores call in ESM-2 (#40210) * remove transpose_for_scores call Signed-off-by: Peter St. John <pstjohn@nvidia.com> * fix copied evolla code Signed-off-by: Peter St. John <pstjohn@nvidia.com> --------- Signed-off-by: Peter St. John <pstjohn@nvidia.com> * Add `chat_template` (`jinja2`) as an extra dependency (#40128) * add jinja2 as a dependency * Make jinja2 a core dependency in install_requires - Add jinja2 to install_requires list in setup.py for automatic installation - Add jinja2 to runtime version checks in dependency_versions_check.py - Resolves issue where pip install transformers doesn't install jinja2 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Make jinja2 a core dependency in install_requires * Make jinja2 an extra dependency instead of adding a core dep --------- Co-authored-by: Claude <noreply@anthropic.com> * [typing] fix type annotation error in DepthPro model image processor (#40238) * fix type annotation error in DepthPro model image processor * fix * run make fix-copies * [serve] guard imports (#39825) guard imports * [`CI`] Fix repo consistency (#40249) * fix * doc --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Fixes for EncoderDecoderCache (#40008) * Add expectation to t5 for rocm 9.4 * Made EncoderDecoderCache compatible with nn.DataParallel * Fixed t5gemma EncoderDecoderCache * Added todos in autoformer * Ruff * Init is self-contained * Review compliance * Fixed kwargs init of EncoderDecoderCache * fix: Catch correct ConnectionError for additional_chat_templates (#39874) * fix: Catch correct ConnectionError for additional_chat_templates * fix: don't catch timeout * fix: formatting * Model card for NLLB (#40074) * initializing branch and draft PR * updated model card .md file * minor * minor * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * resolving comments + adding visuals * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md suggestion Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/nllb.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * NllbTokenizerFast and NllbTokenizer added * endline * minor * Update nllb.md --------- Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Correct typo and update notes in docs Readme (#40234) * Correct typo and update notes in docs readme * Update docs/README.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/README.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix benchmark workflow (#40254) Correct init_db.sql path Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com> * docs: Update OLMo model card (#40233) * Updated OLMo model card * Update OLMo description Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix typo Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix cli typo Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix cli example Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Add bitsandbytes info Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Skip broken tests (#40157) skip these tests * Remove MI300 CI (#40270) Remove MI300 CI (in history if we need it back) * set inputs_embeds to None while generate to avoid audio encoder forward in generation process (#40248) * set inputs_embeds to None while generate to avoid audio encoder forward in generation process * set input_features to none instead --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com> * [detection] fix attention mask for RT-DETR-based models (#40269) * Fix get_contrastive_denoising_training_group attention * Add bool attention_mask conversion * Fix slow static cache export tests (#40261) * 🚨🚨 Switch default compilation to fullgraph=False (#40137) * switch default * docstring * docstring * rework tests and remove outdated restrictions * simplify * we need a check for static cache * fix * rename var * fix * revert * style * rename test * Fix setting attention for multimodal models (#39984) * fix * use non-explicit `None` * keep previously set attn if exists * [detection] fix correct `k_proj` weight and bias slicing in D-FINE (#40257) Fix: correct k_proj weight and bias conversion in D-FINE * Add Kosmos-2.5 (#31711) Add Microsoft Kosmos-2.5 --------- Co-authored-by: kirp@umich.edu <tic-top> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Skipping pytree registration in case fsdp is enabled (#40075) * Skipping pytree registration in case fsdp is enabled * Beauty changes * Beauty changes * Moved the is_fsdp_available function to import utils * Moved is_fsdp_available to integrations.fsdp * Skipping pytree registration in case fsdp is enabled * Beauty changes * Beauty changes * Moved the is_fsdp_available function to import utils * Moved is_fsdp_available to integrations.fsdp * Added pytree registration inside dynamic cache class * Making ci/cd lords happy * Adding a check if DynamicCache is already a leaf * Adding try/catch for multiple initializations of DynamicCache in test suites * Moving dynamic cache pytree registration to executorch * Adding try catch back * Update image_processing_perception_lm_fast.py to allow for proper override of vision_input_type (#40252) * Update image_processing_perception_lm_fast.py Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute. * Update processing_perception_lm.py to match kwargs vision input type * Update image_processing_perception_lm_fast.py kwargs to signature args * fix which routing method (#40283) * Fix chat CLI GPU loading and request_id validation issues (#40230) (#40232) * Fix chat CLI GPU loading and request_id validation issues (#40230) This commit addresses two critical bugs in the transformers chat CLI: 1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments - Chat CLI now automatically uses GPU when available instead of defaulting to CPU - Matches the behavior of the underlying serving infrastructure 2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema - Fixes "Unexpected keys in the request: {'request_id'}" error on second message - Allows request_id to be properly sent and validated by the server Both fixes target the exact root causes identified in issue #40230: - Users will now get GPU acceleration by default when available - Chat sessions will no longer break after the second message * Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming * docs(layoutlm): add missing `id=usage` to `<hfoptions>` tag in LayoutLM model card (#40273) docs(layoutlm): add missing 'id=usage' to <hfoptions> tag in LayoutLM model card * Standardize RAG model card (#40222) * Standardize RAG model card Update rag.md to follow the new Hugging Face model card template: - Added friendly overview in plain language - Added pipeline and AutoModel usage examples - Included quantization example with BitsAndBytesConfig - Added notes and resources sections - Removed abstract and FlashAttention badge * Standardize RAG model card Update rag.md to follow the new Hugging Face model card template: - Added friendly overview in plain language - Added AutoModel usage example - Included quantization example with BitsAndBytesConfig * docs: Update TrOCR model card to new format (#40240) * docs: Update TrOCR model card to new format * Updated Sugegestions * Update model card for gpt neox japanese (#39862) * Update GPT-NeoX-Japanese model card * Apply suggestions from code review * Update gpt_neox_japanese.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * SmolVLM and InternVL: Ensure pixel values are converted to the correct dtype for fp16/bf16 (#40121) * Ensure pixel values are converted to the correct dtype for fp16/bf16 * add to modular * Standardize BertGeneration model card (#40250) * Standardize BertGeneration model card: new format, usage examples, quantization * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/bert-generation.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply reviewer feedback: update code examples * Add missing code example --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Adjust ROCm test output expectations (#40279) Adjust ROCm output expectations * SmolVLM test fixes (#40275) * Fix SmolVLM tests * Add the proper CUDA expectations as well * Split 'A10 and A100 expectations * Ruff --------- Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com> * make model docs device agnostic (2) (#40256) * doc cont. Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * more models Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quicktour.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update mixtral.md --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * [3/3] make docs device agnostic, all en docs for existing models done (#40298) docs to device agnostic cont. Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Add MetaCLIP 2 (#39826) * First draft * Make fixup * Use eos_token_id * Improve tests * Update clip * Make fixup * Fix processor tests * Add conversion script * Update docs * Update tokenization_auto * Make fixup * Use check_model_inputs * Rename to lowercase * Undo CLIP changes * Address comment * Convert all checkpoints * Update auto files * Rename checkpoints * Allow to be able to run `torch.compile` tests with `fullgraph=True` (#40164) * fix * address comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [`FA`] Fix dtype in varlen with position ids (#40295) fix * [docs] delete more TF/Flax docs (#40289) * delete some TF docs * update documentation checks to ignore tf/flax * a few more removals * nit * Update utils/check_repo.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Clean up X-Codec. (#40271) * Clean up xcodec addition. * Clean up config. * Switch to fixtures test. * Small stuff. * Remove OTel SDK dependencies (#40305) * Fix GOT-OCR2 and Cohere2Vision image processor patches caculation (#40312) fix got-ocr patches caculation Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [`fix`] Pass adamw optimizer parameters to StableAdamW (#40184) * fix: pass adamw optimizer parameters to StableAdamW * add test for stable_adamw initialization with trainer arguments * address copilot suggestion * fix: update weight_decay handling in stable_adamw kwargs --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * chore: fix typo in `find_executable_batch_size` to match new 0.9 ratio (#40206) * :rotating_light: [`Flash Attention`] Fix sliding window size (#40163) * swa fix * add comment, make fix symmetrical * modify fa inference test to force swa correctness check * fixup comment * Remove unnecessary contiguous calls for modern torch (#40315) * Add support for Florence-2 (#38188) * init * add modular * fixup * update configuration * add processing file * update auto files * update * update modular * green setup_and_quality ci * it works * fix some tests * commit florence2 * update test * make test cases done - 16 left * style * fix few test cases * fix some tests * fix init test * update florence2 vision style * hope is green * fix init test * fix init * update modular * refactor vision module * fix: channel attention use dynamic scale * update modular * update * update attention mask * update * fix naming * Update src/transformers/models/florence2/processing_florence2.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * spatial block works * more beautiful * more more beautiful * merge main * merge main and fixup * fix typing hint * update modeling * fix eager matches sdpa * fix style * fix compile test - all green * remove florence2 language * remove Florence2LanguageModel things * fix style * update florence2 model * override prepare encoder_decoder for generation * add weight conversion script * rewrite channel attention to use sdpa * eleminate 1 tranpose op * support fa2 * fix quality check * chore: reformat `test_modeling_florence2.py` * some refactor for processor * some refactor for processor * update naming convention and remove BC * make it pass the test * fix: correct Embedding Cosine * update comments and docstring * support input_embeds * support input embeds ideally * fix style * fix style * fix style again :D * add test prcoessor * refactor processor and add test for processor * reformat test processor * make fixup * fix schema check * remove image_token * ensure image token in tokenizer and fix integration tests * fix processor test * add more integration tests for large model and rename test_processor to test_processing * test_assisted_decoding_sample should pass * update doc and make model work with image text to text pipeline * docs: add sdpa bagde * resolve cyril's comments * fix import torch error * add helper get_placeholder_mask * inherit from llava * florence2 may not _supports_attention_backend because of bart ... * move florence2 model card to multimodal * let base model always return_dict * fix style * tiny update doc * set _checkpoint_conversion_mapping = {} * fix code quality * support flex and compile graph and move external func to internal func * remove condition because it always true * remove window funcs * move post processor config out * fix ci * new intro to trigger test * remove `kernel_size` argument --------- Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Qwen2.5-Omni test fixes (#40307) Updated expectations, and mp tests * Add back `_tp_plan` attribute (#39944) * Update modeling_utils.py * make sure we update with the module's plan * use public api * oups * update * fix failing test * Update src/transformers/integrations/tensor_parallel.py * Update src/transformers/integrations/tensor_parallel.py * fix * make the API more friendly! * fix tests * fix styling --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * byebye torch 2.1 (#40317) * Bump minimum torch version to 2.2 * Remove is_torch_greater_or_equal_than_2_2 * update versions table * Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa * No more `natten` (#40287) get rid off natten Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [`GPT OSS`] Refactor the tests as it was not properly checking the outputs (#40288) * it was long due! * use the official kernel * more permissive * update the kernel as well * mmm should it be this? * up pu * fixup * Update test_modeling_gpt_oss.py * style * start with 20b * Update CI with nightly torch workflow file (#40306) * fix nightly ci * Apply suggestions from code review Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> * Fix: Apply `get_placeholder_mask` in Ovis2 (#40280) * Refactor special image mask * Refactor get_placeholder_mask method * Revert "Refactor special image mask" This reverts commit 9eb1828ae930329656d6f323a510c5e6033e1f85. * Fix * Revert "Refactor get_placeholder_mask method" This reverts commit 07aad6484bb08d6351d5b605e9db574d28edcd15. * Update notification service amd_daily_ci_workflows definition (#40314) * One cache class to rule them all (#40276) * remove all classes * fix generate * start replacing everywhere * finish removing everywhere * typo * typo * fix * typo * remove num_layers=1 * CI * fix all docstrings * review * style * Fix chunked attention mask with left-padding (#40324) * add fix * add test * raise proper warning for older versions * fix * fix and add 2nd test * fix for flex and torch 2.5 * [docs] remove flax references from `/en/model_doc` (#40311) * 1st commit * all models up to D * all models up to G * all models up to M * all remaining models * Fix qwen-omni processor text only mode (#40336) * Fix qwen-omni processor text only mode * remove try except --------- Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster> * Change Qwen2RMSNorm to RMSNorm from PyTorch (#40066) * Unify Qwen2RMSNorm definitions and use RMSNorm from PyTorch Signed-off-by: cyy <cyyever@outlook.com> * subclass RMSNorm Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com> * Add DeepseekV3ForSequenceClassification for Deepseek V3 models (#40200) * Add Sequence Classification Support for Deepseek v3 model DeepseekV3ForSequenceClassification * After run make fixup * Fix deprecation warning version (#40343) fix * Add missing arguments to class constructors (#40068) * Add missing arguments Signed-off-by: cyy <cyyever@outlook.com> * Fix typos Signed-off-by: cyy <cyyever@outlook.com> * More fixes Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com> * [docs] remove TF references from `/en/model_doc` (#40344) * models up to F * models up to M * all models * Fix: Only call Trainer.align_special_tokens if model has "config" attribute (#40322) * Only call Trainer.align_special_tokens if model has "config" attribute * Add efficient test for training a model without model.config * Reformat * add type hints (#40319) * add basic type hints to import module * run make fixup * remove optional * fixes --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Fix an infinite loop bug in recursive search of relative imports (#40326) Fix bug in recursive search of relative imports * Fix links in Glm4vMoe configuration classes to point to the correct H… (#40310) * Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository * run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository * T5 test and target device fixes (#40313) * Fix cache setup related issues * Fix target-device-related issues * Ruff * Address review comments * Update `test_spm_converter_bytefallback_warning` (#40284) fff Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * (small) fix conditional for input_ids and input_embeds in marian (#40045) * (small) fix conditional for input_ids and input_embeds in marian * address comment * Fix attention vizualizer (#40285) * make visualizer rely on create causal mask * format * fixup * fixup * read token * read token, duh * what is up with that token * small tests? * adjust * try with flush * normalize for ANSI * buffer shenanigans * [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification (#35991) * [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification * fix the modular conversion * Clean up XCodec and other codecs (#40348) * Clean up xcodec addition. * Clean up config. * Switch to fixtures test. * Small stuff. * Polish XCodec and standardize across codecs. * Update src/transformers/models/xcodec/modeling_xcodec.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Format and fix test. * Update tol. --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * [serve] add cors warnings (#40112) * add cors warnings * Update src/transformers/commands/serving.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/commands/serving.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * make fixup --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300) fix: use consistent dtype for sine positional embeddings * Remove more PyTorch 2.2 compatible code (#40337) Signed-off-by: cyy <cyyever@outlook.com> * [`FA`] Fix some model tests (#40350) * fix * cleanup, revert aimv2 fa changes * fix aria * i searched a long time but the cross dependency is for the recent models so... * this was something... evolla * fix modernbert decoder + make fa test more robust * nit * Qwen2.5-VL test fixes for ROCm (#40308) * [generate] handle support for cache classes when num enc layers != num dec layers (#40277) * handle support for cache classes when num enc layers != num dec layers * handle overwrites * one more corner case * Update src/transformers/generation/utils.py * Update src/transformers/generation/utils.py * Apply suggestions from code review * handle corner case :o * [4/N]more docs to device agnostic (#40355) * more docs to device agnostic Signed-off-by: YAO Matrix <matrix.yao@intel.com> * more Signed-off-by: YAO Matrix <matrix.yao@intel.com> * 1 Signed-off-by: YAO Matrix <matrix.yao@intel.com> * 2 Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update vitpose.md * Update camembert.md * Update camembert.md --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> * DOCS: Clarification on the use of `label_names` as an argument to TrainingArguments (#40353) * Update trainer.md * Update trainer.md Removed the detail about label_names argument usage from the tip/ warning section * Update training_args.py Added the label_names usage clarification in the docstring * Update trainer.md --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * HunYuan opensource (#39606) * merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (#5) * fix attention * use llamamodel * fix code * Fix qk (#6) * fix qk_norm * fix * fix modual * Fix moe (#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (#9) fix moe & gate * Fix gate norm (#10) * add norm_topk_prob * Fix testcase (#11) * fix&skip test * Fix testcase (#12) * skip testcase * Fix norm topk (#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <pridejcyang@tencent.com> Co-authored-by: Mingji Han <mingjihan@tencent.com> * Fix idefics3 vision embeddings indices dtype (#40360) fix idefics3 vision embeddings Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * wav2vec2 fixes (#40341) * Changed datasets to avoid a datasets error * Changed back split to test * Change multimodal data links to HF hub (#40309) change multimodal data links to HF hub * [pipelines] add support to `skip_special_tokens` in the main text generation pipelines (#40356) * add support to skip_special_tokens in pipelines * add test * rm redundant * ⚠️⚠️ Use `dtype` instead of `torch_dtype` everywhere! (#39782) * update everywhere * style * pipelines * switch it everywhere in tests * switch it everywhere in docs * switch in converters everywhere * update in examples * update in model docstrings * style * warnings * style * Update configuration_utils.py * fix * Update configuration_utils.py * fixes and add first test * add pipeline tests * Update test_pipelines_common.py * add config test * Update test_modeling_common.py * add new ones * post rebase * add new * post rebase adds * [processor] move commonalities to mixin (#40339) * move commonalities to mixin * revert - unrelated * fix copies * fix style * comments * [configuration] allow to overwrite kwargs from subconfigs (#40241) allow to overwrite kwargs from subconfigs * fix(example): align parameter names with the latest function definition for gdino (#40369) * Addiing ByteDance Seed Seed-OSS (#40272) add seed oss * Add GptOssForTokenClassification for GPT-OSS models (#40190) * Add GptOssForTokenClassification for GPT-OSS models * After run make fixup * Bug Fix: Dynamically set return_lse flag in FlexAttention (#40352) * bug fix - return_lse dynamically set * addressed compatibility with return type - flex_attention_forward * rename variables * revert changes to commits * Chat Template Doc Fixes (#40173) * draft commit * draft commit * Fixup chat_extras too * Update conversations.md * Update the toctree and titles * Update the writing guide! …

thomwolf and others added 3 commits November 7, 2018 22:12

fixing pre-processing bug - averaging loss for gradient accumulation …

6bb7510

…- no_grad on evaluation

cleaning up - speeding up a bit multi-gpu

dbc318a

Merge branch 'master' into develop

efeb6b1

thomwolf merged commit 5c0838d into master Nov 7, 2018

thomwolf deleted the develop branch November 7, 2018 22:51

qwang70 pushed a commit to DRL36/pytorch-pretrained-BERT that referenced this pull request Mar 2, 2019

Merge pull request huggingface#7 from huggingface/develop

c8fe787

Develop

maeotaku mentioned this pull request May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

HongyanJiao mentioned this pull request Sep 19, 2019

traced_model #1291

Closed

manchandasahil mentioned this pull request Mar 22, 2021

Longformer training : CUDA error: device-side assert triggered #10852

Closed

2 tasks

Guillaume-slize mentioned this pull request May 30, 2021

Encoding/decoding NLP model in tensorflow lite (fine-tuned GPT2) #11947

Closed

rraminen pushed a commit to rraminen/transformers that referenced this pull request Jun 3, 2022

Merge pull request huggingface#7 from ROCmSoftwarePlatform/gpt2-tf2

bd12e8b

Updating GPT2-TF2 Scripts

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this pull request Jun 1, 2023

Merge pull request huggingface#7 from huggingface/main

9aecbdf

goog

lwmlyy mentioned this pull request Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

younesbelkada referenced this pull request in younesbelkada/transformers Mar 14, 2024

Adding Docs and other minor changes (#7)

cacb8ae

RyanMullins pushed a commit to RyanMullins/transformers that referenced this pull request Mar 12, 2025

Merge pull request huggingface#7 from huggingface/multimodals-are-causal

e2c50bc

Add chat template to tokenizer

RyanMullins mentioned this pull request Jun 26, 2025

Gemma 3n #39059

Merged

5 tasks

ArthurZucker pushed a commit that referenced this pull request Jun 26, 2025

Converts einsums to nn.Linear (#7)

a8ba330

* Converts einsums to nn.Linear * Removing unused variables

ArthurZucker pushed a commit that referenced this pull request Aug 5, 2025

Merge pull request #7 from huggingface/megablocks_moe

963d4d8

feat: add megablocks moe mlp kernel

Guo-Chenxu pushed a commit to Guo-Chenxu/transformers that referenced this pull request Aug 28, 2025

Merge pull request huggingface#7 from Guo-Chenxu/minicpm_o_2_6

0448953

change user api

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Develop #7

Develop #7

thomwolf commented Nov 7, 2018

Uh oh!

Uh oh!

Develop #7

Develop #7

Conversation

thomwolf commented Nov 7, 2018

Uh oh!

Uh oh!