Maxwell Instruction Complexity Estimator (MICE)

A fast, efficient, and accurate instruction complexity scorer powered by ModernBERT-Large. MICE predicts normalized task difficulty scores (0–1) for English instructions, with an easy option to rescale to custom ranges.

🚀 Features

Lightweight & Fast: Leverages a compact backbone (ModernBERT-Large + LoRA) with only 14.4M trainable parameters.
Data-Driven: Trained on 66.5K English instruction–score pairs from the DEITA-Complexity dataset.
High Fidelity: Matches the performance of models 34× larger on standard complexity benchmarks.
Flexible Scoring: Outputs normalized scores (0–1) by default, with optional denormalization to any range (e.g., [1–6], [0–100]).

🔧 Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "thethinkmachine/Maxwell-Task-Complexity-Scorer-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# 1. Get normalized complexity (0–1)
def get_normalized_score(text: str) -> float:
    inputs = tokenizer(text, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits.squeeze()
    return float(logits)

# 2. Denormalize to [min_score, max_score]
def get_denormalized_score(text: str, min_score: float = 1, max_score: float = 6) -> float:
    norm = get_normalized_score(text)
    raw = norm * (max_score - min_score) + min_score
    return float(round(raw, 2))

# Example
query = "Is learning equivalent to decreasing local entropy?"
print("Normalized:", get_normalized_score(query))
print("Evol-Complexity [1–6]:", get_denormalized_score(query))

📖 Model Details

Architecture: ModernBERT-Large backbone with LoRA adapters (rank 32, alpha 64, dropout 0.1).
Task: Sequence Classification.
Languages: English.
Training Data: 66,500 instruction–score pairs from [BhabhaAI/DEITA-Complexity].
Normalization: Min–max scaled to [0,1]; denormalization recommended via score * (max - min) + min.

Data Distribution

Original Score	Count	%
1	8,729	13.3%
2	5,399	8.2%
3	10,937	16.7%
4	9,801	15.0%
5	24,485	37.4%
6	6,123	9.3%

Outliers (0,7–9) were pruned (<1% of data).

⚙️ Training Configuration

Optimizer: AdamW (lr=5e-5, weight decay=0.01)
Batch Size: 8
Epochs: 3
Max Seq. Length: 512
Warmup: 10% of total steps
Compute: 50.3M tokens, TTP ratio ≈3.5

🌱 Environmental Impact

Compute Used: 16h on 1× NVIDIA L4 GPU (72W TDP) in GCP asia-south1.
CO₂ Emissions: 0.87 kg CO₂eq (fully offset).
Estimator: ML CO₂ Impact Calculator.

🔍 Bias & Limitations

Domain Bias: Trained primarily on general English; may underperform on technical/coding/math instructions.
Language: English-only.
Scaling Caution: Denormalization preserves ordering but absolute values depend on chosen range.

📚 Citation

If you use MICE in your research, please cite:

Chaubey, S. (2024). Maxwell Instruction Complexity Estimator (MICE). https://huggingface.co/thethinkmachine/MICE

🙋‍♂️ Author & Contact

Shreyan C (thethinkmachine) Email: [email protected]

This project is licensed under the Apache 2.0 License.

thethinkmachine
/

MICE