image/png

II-Search-4B

image/png

image/png

Model Description

II-Search-4B is a 4B parameter language model based on Qwen3-4B, fine-tuned specifically for information seeking tasks and web-integrated reasoning. It excels at complex multi-hop information retrieval, fact verification, and comprehensive report generation.

Key Features

  • Enhanced tool usage for web search and webpage visits
  • Multi-hop reasoning capabilities with sophisticated planning
  • Verified information retrieval with cross-checking
  • Strong performance on factual QA benchmarks
  • Comprehensive report generation for research queries

Training Methodology

Our training process consisted of three key phases:

Phase 1: Tool Call Ability Stimulation

We used a distillation approach from larger models (Qwen3-235B) to generate reasoning paths with function calling on multi-hop datasets. This established the base capabilities for tool use.

Phase 2: Reasoning Improvement

We addressed initial limitations by:

  • Creating synthetic problems requiring more reasoning turns, inspired by Random Walk algorithm
  • Improving reasoning thought patterns for more efficient and cleaner reasoning paths

Phase 3: Rejection Sampling & Report Generation

We applied:

  • Filtering to keep only high-quality reasoning traces (correct answers with proper reasoning)
  • STORM-inspired techniques to enhance comprehensive report generation

Phase 4: Reinforcement Learning

We trained the model using reinforcement learning

  • Used dataset: dgslibisey/MuSiQue
  • Incorporated our in-house search database (containing Wiki data, Fineweb data, and ArXiv data)

Performance

Benchmark Qwen3-4B Jan-4B WebSailor-3B II-Search-4B
OpenAI/SimpleQA 76.8 80.1 81.8 91.8
Google/Frames 30.7 24.8 34.0 67.5
Seal_0 6.31 2.7 1.8 22.5

Tool Usage Comparison

Simple QA (SerpDev)

Qwen3-4B Jan-4B WebSailor-3B II-Search-4B
# Search 1.0 0.9 2.1 2.2
# Visit 0.1 1.9 6.4 3.5
# Total Tools 1.1 2.8 8.5 5.7

All benchmark traces from models can be found at: https://huggingface.co/datasets/II-Vietnam/Inspect-Search-Models-Benchmarking-Result

Intended Use

II-Search-4B is designed for:

  • Information seeking and factual question answering
  • Research assistance and comprehensive report generation
  • Fact verification and evidence-based reasoning
  • Educational and research applications requiring factual accuracy

Usage

To deploy and interact with the II-Search-4B model effectively, follow these options:

  1. Serve the model using vLLM or SGLang

Use the following command to serve the model with vLLM (adjust parameters as needed for your hardware setup):

vllm serve Intelligent-Internet/II-Search-4B --served-model-name II-Search-4B --tensor-parallel-size 8 --enable-reasoning --reasoning-parser deepseek_r1 --rope-scaling '{"rope_type":"yarn","factor":1.5,"original_max_position_embeddings":98304}' --max-model-len 131072

This configuration enables distributed tensor parallelism across 8 GPUs, reasoning capabilities, custom RoPE scaling for extended context, and a maximum context length of 131,072 tokens.

  1. Integrate web_search and web_visit tools

Equip the served model with web_search and web_visit tools to enable internet-aware functionality. Alternatively, use a middleware like MCP for tool integration鈥攕ee this example repository: https://github.com/hoanganhpham1006/mcp-server-template.

Host on macOS with MLX for local use

As an alternative for Apple Silicon users, host the quantized II-Search-4B-MLX version on your Mac. Then, interact with it via user-friendly interfaces like LM Studio or Ollama Desktop.

Recommended Generation Parameters

generate_cfg = {
    'top_k': 20,
    'top_p': 0.95,
    'temperature': 0.6,
    'repetition_penalty': 1.1,
    'max_tokens': 2048
}
  • For a query that you need to find a short and accurate answer. Add the following phrase: "\n\nPlease reason step-by-step and put the final answer within \\boxed{}."

Citation

@misc{II-Search-4B,
  author = {Intelligent Internet},
  title = {II-Search-4B: Information Seeking and Web-Integrated Reasoning LLM},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Hub},
  howpublished = {\url{https://huggingface.co/II-Vietnam/II-Search-4B}},
}
Downloads last month
23
Safetensors
Model size
1.8B params
Tensor type
BF16
I64
I32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for cpatonn/II-Search-4B-AWQ-8bit

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(7)
this model