NuExtract-2-2B-GGUF Model Repository

This repository contains the GGUF (GGML Universal Format) versions of the NuMind/NuExtract-2.0-2B model, ready for use with llama.cpp and other GGUF-compatible tools.

These files were generated using the latest convert_hf_to_gguf.py and llama-quantize tools from the llama.cpp repository.

Original Model Information

Original HF Repo: NuMind/NuExtract-2.0-2B
Base Model: Based on the Qwen2-VL-2B-Instruct architecture.
Description: NuExtract 2.0 is a powerful, multilingual family of models specialized for structured information extraction from various sources, including images.

This GGUF conversion allows the model to run efficiently on a wide range of consumer hardware (CPU and GPU).

Provided Files & Quantization Details

This repository offers multiple quantization levels to suit different hardware and performance needs. Quantization reduces model size and memory usage, often with a minimal impact on quality. The "K-Quants" (_K_) are generally recommended over the older quant types.

File Name	Quantization Method	Size	Notes
`NuExtract-2-2B-Q4_K_M.gguf`	`Q4_K_M`	1.1 GB	Balanced Default. The best all-around choice for quality, speed, and size.
`NuExtract-2-2B-Q5_K_M.gguf`	`Q5_K_M`	1.3 GB	High Quality. A great balance, noticeably better than 4-bit. Recommended if you have >2GB VRAM.
`NuExtract-2-2B-Q6_K.gguf`	`Q6_K`	1.5 GB	Very High Quality. Excellent quality with a significant size reduction over 8-bit.
`NuExtract-2-2B-Q8_0.gguf`	`Q8_0`	1.9 GB	Highest Quality. Nearly lossless. Use for benchmarks or if you want the best possible output.
`NuExtract-2-2B-IQ3_S.gguf`	`IQ3_S`	848 MB	Good Compression. A smart 3-bit quant for memory-constrained systems.
`NuExtract-2-2B-Q3_K_M.gguf`	`Q3_K_M`	920 MB	A good alternative 3-bit quant.
`NuExtract-2-2B-Q2_K.gguf`	`Q2_K`	737 MB	Maximum Compression. Very small size, but expect a significant drop in quality.
`NuExtract-2-2B-experimental-fp16.gguf`	`F16`	3.6 GB	Unquantized. Full 16-bit precision. For developers who wish to perform their own quantization.

Note: Older quant types (Q4_0, Q5_0, etc.) are also provided but the _K and IQ versions are generally superior.

How to Use

You can use these models with any program that supports GGUF, such as llama.cpp, Ollama, LM Studio, and many others.