--- pipeline_tag: text-generation inference: false license: apache-2.0 library_name: transformers tags: - language - granite-4.0 base_model: - ibm-granite/granite-4.0-tiny-base-preview --- # granite-4.0-tiny-preview GGUF Models ## Model Generation Details This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`adef8178`](https://github.com/ggerganov/llama.cpp/commit/adef81781a15083f218eae6c488b95cdad781971). --- ## Quantization Beyond the IMatrix I've been experimenting with a new quantization approach that selectively elevates the precision of key layers beyond what the default IMatrix configuration provides. In my testing, standard IMatrix quantization underperforms at lower bit depths, especially with Mixture of Experts (MoE) models. To address this, I'm using the `--tensor-type` option in `llama.cpp` to manually "bump" important layers to higher precision. You can see the implementation here: 👉 [Layer bumping with llama.cpp](https://github.com/Mungert69/GGUFModelBuilder/blob/main/model-converter/tensor_list_builder.py) While this does increase model file size, it significantly improves precision for a given quantization level. ### **I'd love your feedback—have you tried this? How does it perform for you?** --- Click here to get info on choosing the right GGUF model format --- # Granite-4.0-Tiny-Preview **Model Summary:** Granite-4-Tiny-Preview is a 7B parameter fine-grained hybrid mixture-of-experts (MoE) instruct model finetuned from Granite-4.0-Tiny-Base-Preview using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, and model alignment using reinforcement learning. - **Developers:** Granite Team, IBM - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/) - **Release Date**: May 2nd, 2025 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) **Supported Languages:** English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. However, users may finetune this Granite model for languages beyond these 12 languages. **Intended Use:** This model is designed to handle general instruction-following tasks and can be integrated into AI assistants across various domains, including business applications. **Capabilities** * Thinking * Summarization * Text classification * Text extraction * Question-answering * Retrieval Augmented Generation (RAG) * Code related tasks * Function-calling tasks * Multilingual dialog use cases * Long-context tasks including long document/meeting summarization, long document QA, etc. **Installation:** You need to install transformer from source to use this checkpoint. HuggingFace PR: https://github.com/huggingface/transformers/pull/37658 Install transformer from source: https://huggingface.co/docs/transformers/en/installation#install-from-source **Generation:** After installation, copy the code snippet below to run the example. ```python from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed import torch model_path="ibm-granite/granite-4.0-tiny-preview" device="cuda" model = AutoModelForCausalLM.from_pretrained( model_path, device_map=device, torch_dtype=torch.bfloat16, ) tokenizer = AutoTokenizer.from_pretrained( model_path ) conv = [{"role": "user", "content":"You have 10 liters of a 30% acid solution. How many liters of a 70% acid solution must be added to achieve a 50% acid mixture?"}] input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device) set_seed(42) output = model.generate( **input_ids, max_new_tokens=8192, ) prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True) print(prediction) ``` **Evaluation Results:**
Models | Arena-Hard | AlpacaEval-2.0 | MMLU | PopQA | TruthfulQA | BigBenchHard | DROP | GSM8K | HumanEval | HumanEval+ | IFEval | AttaQ |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Granite-3.3-2B-Instruct | 28.86 | 43.45 | 55.88 | 18.4 | 58.97 | 52.51 | 35.98 | 72.48 | 80.51 | 75.68 | 65.8 | 87.47 |
Granite-3.3-8B-Instruct | 57.56 | 62.68 | 65.54 | 26.17 | 66.86 | 59.01 | 41.53 | 80.89 | 89.73 | 86.09 | 74.82 | 88.5 |
Granite-4.0-Tiny-Preview | 26.70 | 35.16 | 60.40 | 22.93 | 58.07 | 55.71 | 46.22 | 70.05 | 82.41 | 78.33 | 63.03 | 86.10 |