Spaces:

Tonic
/

SmolFactory

Running

App Files Files Community

SmolFactory / docs /QUANTIZATION_FIX_SUMMARY.md

testtest123

cleanup a bit the files

ad3b15d unverified 4 months ago

preview code

raw

history blame

4.96 kB

Quantization Fix Summary

Issues Identified

The quantization script was failing due to several compatibility issues:

Int8 Quantization Error:
- Error: The model is quantized with QuantizationMethod.TORCHAO and is not serializable
- Cause: Offloaded modules in the model cannot be quantized with torchao
- Solution: Added alternative save method and fallback to bitsandbytes
Int4 Quantization Error:
- Error: Could not run 'aten::_convert_weight_to_int4pack_for_cpu' with arguments from the 'CUDA' backend
- Cause: Int4 quantization requires CPU backend but was being attempted on CUDA
- Solution: Added proper device selection logic
Monitoring Error:
- Error: 'SmolLM3Monitor' object has no attribute 'log_event'
- Cause: Incorrect monitoring API usage
- Solution: Added flexible monitoring method detection

Fixes Implemented

1. Enhanced Device Management (`scripts/model_tonic/quantize_model.py`)

def get_optimal_device(self, quant_type: str) -> str:
    """Get optimal device for quantization type"""
    if quant_type == "int4_weight_only":
        # Int4 quantization works better on CPU
        return "cpu"
    elif quant_type == "int8_weight_only":
        # Int8 quantization works on GPU
        if torch.cuda.is_available():
            return "cuda"
        else:
            logger.warning("⚠️ CUDA not available, falling back to CPU for int8")
            return "cpu"
    else:
        return "auto"

2. Alternative Quantization Method

Added quantize_model_alternative() method using bitsandbytes for better compatibility:

def quantize_model_alternative(self, quant_type: str, device: str = "auto", group_size: int = 128, save_dir: Optional[str] = None) -> Optional[str]:
    """Alternative quantization using bitsandbytes for better compatibility"""
    # Uses BitsAndBytesConfig instead of TorchAoConfig
    # Handles serialization issues better

3. Improved Error Handling

Added fallback from torchao to bitsandbytes
Enhanced save method with alternative approaches
Better device mapping for different quantization types

4. Fixed Monitoring Integration

def log_to_trackio(self, action: str, details: Dict[str, Any]):
    """Log quantization events to Trackio"""
    if self.monitor:
        try:
            # Use the correct monitoring method
            if hasattr(self.monitor, 'log_event'):
                self.monitor.log_event(action, details)
            elif hasattr(self.monitor, 'log_metric'):
                self.monitor.log_metric(action, details.get('value', 1.0))
            elif hasattr(self.monitor, 'log'):
                self.monitor.log(action, details)
            else:
                logger.info(f"📊 {action}: {details}")
        except Exception as e:
            logger.warning(f"⚠️ Failed to log to Trackio: {e}")

Usage Instructions

1. Install Dependencies

pip install -r requirements_quantization.txt

2. Run Quantization

python3 quantize_and_push.py

3. Test Fixes

python3 test_quantization_fix.py

Expected Behavior

Successful Quantization

The script will now:

Try torchao first for each quantization type
Fall back to bitsandbytes if torchao fails
Use appropriate devices (CPU for int4, GPU for int8)
Handle serialization issues with alternative save methods
Log progress without monitoring errors

Output

✅ Model files validated
🔄 Processing quantization type: int8_weight_only
🔄 Using device: cuda
✅ int8_weight_only quantization and push completed
🔄 Processing quantization type: int4_weight_only
🔄 Using device: cpu
✅ int4_weight_only quantization and push completed
📊 Quantization summary: 2/2 successful
✅ Quantization completed successfully!

Troubleshooting

If All Quantization Fails

Install bitsandbytes:
```
pip install bitsandbytes
```
Check model path:
```
ls -la /output-checkpoint
```
Verify dependencies:
```
python3 test_quantization_fix.py
```

Common Issues

Memory Issues: Use CPU for int4 quantization
Serialization Errors: The script now handles these automatically
Device Conflicts: Automatic device selection based on quantization type

Files Modified

scripts/model_tonic/quantize_model.py - Main quantization logic
quantize_and_push.py - Main script with better error handling
test_quantization_fix.py - Test script for verification
requirements_quantization.txt - Dependencies file

Next Steps

Run the test script to verify fixes
Install bitsandbytes if not already installed
Run the quantization script
Check the Hugging Face repository for quantized models

The fixes ensure robust quantization with multiple fallback options and proper error handling.