Uniform INT8 Quantized DeepSeek-OCR

This model is a uniformly quantized version of deepseek-ai/DeepSeek-OCR.

Quantization Details

  • Method: Uniform INT8 quantization
  • Quantized Layers: 2342
  • Vision Layers: 96 @ 8-bit
  • Language Layers: 2197 @ 8-bit
  • Average Bit-width: 8.00
  • Original Size: 6363.12 MB
  • Compressed Size: 3351.56 MB
  • Compression Ratio: 1.90x

Model Files

  • quantized_weights.pt: Quantized model weights
  • quantization_info.json: Layer-wise quantization configuration
  • layer_configs.json: Detailed layer configurations
  • compression_stats.json: Compression statistics
  • layer_analysis.json: Modality analysis (vision/language/other)

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)

# Load quantized weights
state_dict = torch.load("quantized_weights.pt")
# Note: You'll need the QuantizedLinear class to properly load and use this model

Baseline Characteristics

This uniform quantization approach:

  • Applies the same 8-bit quantization to ALL layers
  • Does not distinguish between vision and language modalities
  • Serves as a baseline for comparison with modality-aware methods

Citation

If you use this model, please cite the original model and mention the uniform quantization approach.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamMikaelson/deepseek-ocr-int8-quantized

Finetuned
(115)
this model