README.md update

Browse files

Files changed (1) hide show

README.md +129 -3

README.md CHANGED Viewed

@@ -1,3 +1,129 @@
----
-license: llama3.2
----

+---
+language:
+- en
+- de
+- fr
+- it
+- pt
+- hi
+- es
+- th
+license: llama3.2
+library_name: transformers
+tags:
+- autoround
+- intel
+- gptq
+- woq
+- meta
+- pytorch
+- llama
+- llama-3
+model_name: Llama 3.2 11B Vision Instruct
+base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
+inference: false
+model_creator: meta-llama
+pipeline_tag: text-generation
+prompt_template: '{prompt}
+  '
+quantized_by: fbaldassarri
+---
+## Model Information
+Converted version of [meta-llama/Llama-3.2-11B-Vision-Instruct](meta-llama/Llama-3.2-11B-Vision-Instruct) to [OpenVINO](https://github.com/openvinotoolkit/openvino) Intermediate Representation (IR) for CPU devices inference.
+Model consists of 2 parts:
+- **Image Encoder**, as openvino_vision_encoder.bin, for encoding input images into LLM cross attention states space;
+- **Language Model**, as openvino_language_model.bin, for generation answer based on cross attention states provided by Image Encoder and input tokens.
+Then, for reducing memory consumption, weights compression optimization has applied using [Neural Network Compression Framework (NNCF)](https://github.com/openvinotoolkit/nncf) that provides 4-bit/8-bit mixed weight quantization as a compression method primarily designed to optimize LLMs.
+Note: Compressed model can be found in as llm_int4_asym_r10_gs64_max_activation_variance_awq_scale_all_layers.bin/.xml
+- 4 bits (INT4)
+- group size = 64
+- Asymmetrical Quantization
+- method AWQ
+Finally, an INT8 quantized version of the Imange Enconder only can be find as openvino_vision_encoder_int8.bin/.xml.
+## Replication Recipe
+### Step 1 Install Requirements
+I suggest to install requirements into a dedicated python-virtualenv or a conda enviroment.
+'''
+pip install -q "torch>=2.1" "torchvision" "Pillow" "tqdm" "datasets>=2.14.6" "gradio>=4.36" "nncf>=2.13.0" --extra-index-url https://download.pytorch.org/whl/cpu
+pip install -q "transformers>=4.45" --extra-index-url https://download.pytorch.org/whl/cpu
+pip install -Uq --pre "openvino>2024.4.0" --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
+'''
+### Step 2 Convert the model in OpenVINO Intermediate Representation (IR)
+'''
+from pathlib import Path
+from ov_mllama_helper import convert_mllama
+model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
+model_dir = Path(model_id.split("/")[-1]) / "OpenVino"
+convert_mllama(model_id, model_dir)
+'''
+### Step 3 INT4 Compression
+'''
+from ov_mllama_compression import compress
+from ov_mllama_compression import compression_widgets_helper
+compression_scenario, compress_args = compression_widgets_helper()
+compression_scenario
+compression_kwargs = {key: value.value for key, value in compress_args.items()}
+language_model_path = compress(model_dir, **compression_kwargs)
+'''
+### Step 4 INT8 Image Enconder Optimization
+'''
+from ov_mllama_compression import vision_encoder_selection_widget
+vision_encoder_options = vision_encoder_selection_widget(device.value)
+vision_encoder_options
+from transformers import AutoProcessor
+import nncf
+import openvino as ov
+import gc
+from data_preprocessing import prepare_dataset_vision
+processor = AutoProcessor.from_pretrained(model_dir)
+core = ov.Core()
+fp_vision_encoder_path = model_dir / "openvino_vision_encoder.xml"
+int8_vision_encoder_path = model_dir / fp_vision_encoder_path.name.replace(".xml", "_int8.xml")
+int8_wc_vision_encoder_path = model_dir / fp_vision_encoder_path.name.replace(".xml", "_int8_wc.xml")
+calibration_data = prepare_dataset_vision(processor, 100)
+ov_model = core.read_model(fp_vision_encoder_path)
+calibration_dataset = nncf.Dataset(calibration_data)
+quantized_model = nncf.quantize(
+   model=ov_model,
+   calibration_dataset=calibration_dataset,
+   model_type=nncf.ModelType.TRANSFORMER,
+   advanced_parameters=nncf.AdvancedQuantizationParameters(smooth_quant_alpha=0.6),
+ )
+ov.save_model(quantized_model, int8_vision_encoder_path)
+del quantized_model
+del ov_model
+del calibration_dataset
+del calibration_data
+gc.collect()
+vision_encoder_path = int8_vision_encoder_path
+'''
+## License
+[Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)
+## Disclaimer
+This quantized model comes with no warrenty. It has been developed only for research purposes.