gptqmodel need use checkpoint_format #1

LRL-ModelCloud · 2024-12-02T11:57:38Z

GPTQModel needs to use checkpoint_format to determine whether to use GPTQ or GPTQ_v2. If it is GPTQ, it needs to be converted to GPTQ_v2.

…v2-load

* align gptq check to transformers for supporting cpu * fix comment * gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * compatible with auto-gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix compatible with auto-gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix compatible with auto-gptq linear Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * need checkpoint_format * default value of checkpoint_format is gptq * fix quantize * fix quantize * fix quantize * Update quantizer.py * need convert to v1 before gptqmodel save * back checkpoint_format to gptq after convert * cleanup code * sym=False is not supported with auto-gptq * add comments * cleanup code * Update quantizer.py * always convert v2 to v1 if checkpoint_format = "gptq" * Update quantizer.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Mod backend code (#2) * keep gptq_v2 if sym is false * use hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format, and hf_gptqmodel_post_init * no need check backend * use device_map * cleanup * Update quantizer.py * move import --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format and log Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update check quant type Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix optimum compat (#3) * add meta info * cleanup * cleanup * The value of quantizer should be an array * Update quantizer.py * If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer" * If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer" * Update quantizer.py * cleanup * comment on meta * hf_select_quant_linear pass checkpoint_format * add todo fix * move convert code to quantizer.save() * Update quantizer.py * Optimize hf_convert_gptq_v2_to_v1_format() * Optimize hf_convert_gptq_v1_to_v2_format() * fix GPTQTestCUDA * hf_select_quant_linear() always set pack=True * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * GPTQQuantizer add backend * lower checkpoint_format and backend * cleanup * move backend to bottom * no need to check gptqmodel version for ipex support * Update import_utils.py * Update quantizer.py * fix UnboundLocalError: cannot access local variable 'version' where it is not associated with a value * make version var short * Update import_utils.py * fix unittest * use assertLessEqual --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: LRL <lrl@lbx.dev> * fix format and convert v2 to v1 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * [Fix] all tensors not same device (#5) * fix device error * update gptqmodel version * fix test * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * add gptqmodel tests which contains cpu Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix all auto-gptq tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * rm gptqmodel yaml Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable real cpu tests by fp32 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix test model name Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * keep the original device setting when using auto-gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update optimum/gptq/quantizer.py Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> * Update optimum/gptq/quantizer.py Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

need checkpoint_format

53e2027

LRL-ModelCloud changed the title ~~need checkpoint_format~~ gptqmodel need use checkpoint_format Dec 2, 2024

LRL-ModelCloud and others added 15 commits December 2, 2024 20:09

default value of checkpoint_format is gptq

e574ff2

fix quantize

1e64b26

Merge remote-tracking branch 'origin/fix-gptq-v2-load' into fix-gptq-…

c30dc4d

…v2-load

fix quantize

933a985

fix quantize

41d3568

Update quantizer.py

7aad449

need convert to v1 before gptqmodel save

9f5f0f1

back checkpoint_format to gptq after convert

36b683d

cleanup code

78d098e

sym=False is not supported with auto-gptq

3a58a9c

add comments

850be94

cleanup code

8ad66c4

Update quantizer.py

2f18c1d

always convert v2 to v1 if checkpoint_format = "gptq"

e449a00

Update quantizer.py

91ab8fa

jiqing-feng merged commit 27d2f2b into jiqing-feng:gptq Dec 3, 2024

LRL-ModelCloud deleted the fix-gptq-v2-load branch December 4, 2024 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gptqmodel need use checkpoint_format #1

gptqmodel need use checkpoint_format #1

Uh oh!

LRL-ModelCloud commented Dec 2, 2024

Uh oh!

Uh oh!

gptqmodel need use checkpoint_format #1

gptqmodel need use checkpoint_format #1

Uh oh!

Conversation

LRL-ModelCloud commented Dec 2, 2024

Uh oh!

Uh oh!