Kernel dies after quantized categorical variable

### CatBoost Version: 1.2.2
### Environment: Corporate Jupyter Lab Server

### Problem Description

When creating a `Pool` from a `pandas DataFrame`, memory usage doubles (one copy for the `DataFrame`, another for the `Pool`). To mitigate this, we implemented the following workflow:

1. Create Pool
2. Save Pool
3. Restart kernel to clear memory
4. Reload Pool
5. Train model

**Issue:** Training consistently fails at iteration 998/1000 when using a quantized Pool containing a large categorical feature ("registration address").

**Key Details About the Feature**
Name: registration address
Type: String-based categorical
Unique values: ~1.15 million unique values (80% uniqueness in 1,443,378 samples)

Reproduction Code
```python
# Step 1: Create and quantize Pool
train_pool = Pool(
    data=train_df[all_features],
    label=train_df[TARGET],
    cat_features=cat_feats
)
train_pool.quantize()
train_pool.save('volumes/my_work/Tarasov/Model/Pool.bin')

# After kernel restart
# Step 4: Reload Pool
train_pool = Pool(
    'quantized://volumes/my_work/Tarasov/Model/Pool.bin'
)

# Step 5: Train
model = CatBoostClassifier(
    iterations=1000,
    loss_function='Logloss',
    eval_metric='AUC',
    random_state=888,
    thread_count=30
)
model.fit(train_pool, logging_level='Debug')  # Kernel dies here  (fails at 998/1000 iterations)
```
**Observations**
The failure only occurs when using the quantized Pool with the "registration address" feature. It fails even is Pool consists only from "registration address" feature.
Removing this feature allows training to complete successfully.
Memory usage appears normal during training (no OOM errors observed).

**Questions**
*Pool Serialization:* Is there a way to save a Pool with raw string categorical features without quantization?
(Current workaround forces quantization via quantize() to reduce memory usage)

*Potential Bug:* Could quantization of high-cardinality categorical features cause instability during training?
The consistent failure at iteration 998 suggests a possible edge-case bug in the quantization/training pipeline.

Unfortunately I cant represent dataset here. I have 128 GB of CPU RAM and 32 kernels, and peak CPU RAM usage was no more than 10 gb i guess.

### Update
Kernel crashes even on small subsample of the example above.
I managed to recreate this issue in 1.2.7 version https://colab.research.google.com/drive/1O27wEymA_jrcRdijSrxdoTTNz3PDSG7v#scrollTo=v5HS8y4gZCQN  
Please note that kernel crashed on 998 iteration

![Image](https://github.com/user-attachments/assets/55846021-0b89-4bbc-9043-e4ce95ea3ef4)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kernel dies after quantized categorical variable #2816

CatBoost Version: 1.2.2

Environment: Corporate Jupyter Lab Server

Problem Description

Update

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kernel dies after quantized categorical variable #2816

Description

CatBoost Version: 1.2.2

Environment: Corporate Jupyter Lab Server

Problem Description

Update

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions