Skip to content

Conversation

LRL-ModelCloud
Copy link

GPTQModel needs to use checkpoint_format to determine whether to use GPTQ or GPTQ_v2. If it is GPTQ, it needs to be converted to GPTQ_v2.

@LRL-ModelCloud LRL-ModelCloud changed the title need checkpoint_format gptqmodel need use checkpoint_format Dec 2, 2024
@jiqing-feng jiqing-feng merged commit 27d2f2b into jiqing-feng:gptq Dec 3, 2024
@LRL-ModelCloud LRL-ModelCloud deleted the fix-gptq-v2-load branch December 4, 2024 07:29
jiqing-feng added a commit that referenced this pull request Dec 23, 2024
* align gptq check to transformers for supporting cpu

* fix comment

* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* compatible with auto-gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix compatible with auto-gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix compatible with auto-gptq linear

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format  (#1)

* need checkpoint_format

* default value of checkpoint_format is gptq

* fix quantize

* fix quantize

* fix quantize

* Update quantizer.py

* need convert to v1 before gptqmodel save

* back checkpoint_format to gptq after convert

* cleanup code

* sym=False is not supported with auto-gptq

* add comments

* cleanup code

* Update quantizer.py

* always convert v2 to v1 if checkpoint_format = "gptq"

* Update quantizer.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Mod backend code (#2)

* keep gptq_v2 if sym is false

* use hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format, and hf_gptqmodel_post_init

* no need check backend

* use device_map

* cleanup

* Update quantizer.py

* move import

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format and log

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update check quant type

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix optimum compat (#3)

* add meta info

* cleanup

* cleanup

* The value of quantizer should be an array

* Update quantizer.py

* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"

* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"

* Update quantizer.py

* cleanup

* comment on meta

* hf_select_quant_linear pass checkpoint_format

* add todo fix

* move convert code to quantizer.save()

* Update quantizer.py

* Optimize hf_convert_gptq_v2_to_v1_format()

* Optimize hf_convert_gptq_v1_to_v2_format()

* fix GPTQTestCUDA

* hf_select_quant_linear() always set pack=True

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* GPTQQuantizer add backend

* lower checkpoint_format and backend

* cleanup

* move backend to bottom

* no need to check gptqmodel version for ipex support

* Update import_utils.py

* Update quantizer.py

* fix UnboundLocalError: cannot access local variable 'version' where it is not associated with a value

* make version var short

* Update import_utils.py

* fix unittest

* use assertLessEqual

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: LRL <lrl@lbx.dev>

* fix format and convert v2 to v1

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* [Fix] all tensors not same device (#5)

* fix device error

* update gptqmodel version

* fix test

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add gptqmodel tests which contains cpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix all auto-gptq tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm gptqmodel yaml

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable real cpu tests by fp32

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test model name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* keep the original device setting when using auto-gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update optimum/gptq/quantizer.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* Update optimum/gptq/quantizer.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants