Monet: Mixture of Monosemantic Experts for Transformers

Model Summary

Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance.

Resources and Technical Documentation

GitHub Repository: https://github.com/dmis-lab/Monet
Paper: https://arxiv.org/abs/2412.04139
Model Hub: https://huggingface.co/MonetLLM
Demo: https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer

Available Checkpoints

Base Models

Model	Dataset	#Params	#Tokens	Checkpoint	Demo
Monet-VD	FineWeb-Edu	850M	100BT	monet-vd-850M-100BT-hf
		1.4B	100BT	monet-vd-1.4B-100BT-hf	Viewer
		4.1B	100BT	monet-vd-4.1B-100BT-hf
	StarCoderData	1.4B	100BT	codemonet-vd-1.4B-100BT-hf	Viewer
Monet-HD	FineWeb-Edu	850M	100BT	monet-hd-850M-100BT-hf
		1.4B	100BT	monet-hd-1.4B-100BT-hf
		4.1B	100BT	monet-hd-4.1B-100BT-hf

Instruction-Tuned Models

Model	Purpose	Recipe	#Params	Checkpoint
Monet-VD	Chat Completion	SmolLM	1.4B	monet-vd-1.4B-100BT-chat-hf
Monet-VD	Vision-Language Model	LLaVA	1.6B	visionmonet-vd-1.4B-100BT-hf

Evaluation

Open-Ended LLM Benchmarks

Model	MMLU	ARC	WG	PIQA	SIQA	OBQA	HS	CSQA	Avg.
0-shot
Monet-HD 850M	0.320	0.460	0.506	0.699	0.416	0.364	0.465	0.337	0.446
Monet-VD 850M	0.328	0.456	0.530	0.708	0.417	0.356	0.488	0.343	0.453
Monet-HD 1.4B	0.338	0.471	0.538	0.714	0.418	0.382	0.501	0.339	0.463
Monet-VD 1.4B	0.352	0.495	0.522	0.727	0.423	0.418	0.529	0.363	0.478
Monet-HD 4.1B	0.375	0.558	0.560	0.741	0.427	0.414	0.571	0.379	0.503
Monet-VD 4.1B	0.380	0.547	0.557	0.751	0.437	0.424	0.604	0.389	0.511
5-shot
Monet-HD 850M	0.332	0.537	0.510	0.697	0.409	0.346	0.479	0.420	0.466
Monet-VD 850M	0.341	0.548	0.520	0.709	0.437	0.368	0.504	0.454	0.485
Monet-HD 1.4B	0.352	0.544	0.530	0.720	0.432	0.360	0.518	0.441	0.487
Monet-VD 1.4B	0.360	0.547	0.526	0.730	0.441	0.422	0.551	0.501	0.510
Monet-HD 4.1B	0.385	0.603	0.545	0.742	0.463	0.412	0.588	0.545	0.535
Monet-VD 4.1B	0.398	0.625	0.564	0.761	0.470	0.438	0.619	0.525	0.550

Detoxification

Detoxification task performances are evaluated on the Monet-VD 1.4B model.

RealToxicityPrompts

Masking Threshold	Masking Ratio	Exp. Max. Toxicity		Toxicity Prob.		Avg. Perf.
Masking Threshold	Masking Ratio	Toxic	Non-Toxic	Toxic	Non-Toxic	Avg. Perf.
–	–	0.795	0.269	0.926	0.08	0.478
0.2	1.0%	0.767	0.268	0.909	0.07	0.479
0.1	4.1%	0.657	0.270	0.768	0.08	0.478
0.05	14.4%	0.552	0.256	0.564	0.05	0.467

ToxiGen

Masking Threshold	Masking Ratio	RoBERTa Score		Avg. Perf.
Masking Threshold	Masking Ratio	Hate	Neutral	Avg. Perf.
–	–	0.642	0.035	0.478
0.2	1.4%	0.643	0.033	0.478
0.1	5.4%	0.504	0.028	0.473
0.05	15.0%	0.430	0.027	0.455

Examples

Text Generation

from transformers import pipeline

model_name = "MonetLLM/monet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"])

Code Generation

from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = '''
def print_len(x: str):
    """For a given string x, print the length of x."""
'''
print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0])

Chat Completion

from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hi! How are you?"}],
    add_generation_prompt=True,
    tokenize=False,
)
print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"])

Using vLLM

The custom implementation of vLLM is provided in the repository.

from vllm import LLM, ModelRegistry, SamplingParams
from modeling_monet_vllm import MonetForCausalLM

# Register Monet architecture with vLLM
ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM)

model = LLM(
    "MonetLLM/monet-vd-1.4B-100BT-hf",
    trust_remote_code=True,
    dtype="bfloat16",
    gpu_memory_utilization=0.8
)
sampling_params = SamplingParams(max_tokens=20, temperature=1.0)
print(model.generate("The key to life is", sampling_params)[0].outputs[0].text)

Training

Model

Architecture: Monet
Pretraining tokens: 100B
Precision: bfloat16

Hardware

TPUs: TPU-v4-64 Pod Slice (supported by TRC Program)

Software

Training Framework: Jax, Flax

Intended Use

Primary Intended Uses

This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include:

Mechanistic interpretability research for language models
Text generation with enhanced interpretability
Code generation (CodeMonet variant)
Chat completion (instruction-tuned variant)
Vision-language tasks (VisionMonet variant)

Out-of-Scope Uses

This model has not been explicitly developed or tested for all potential downstream applications. Therefore:

Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios.
Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to FineWeb-Edu).
No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released.
Unsupported Programming Languages: Programming in languages not covered by StarCoderData(CodeMonet variant) is not within the model’s intended scope.

Model Architecture

Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations:

Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts
Fine-grained expert specialization: offers clear insight into model behavior
Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level.

Ethical Considerations

Transparency

Designed specifically for enhanced interpretability
Enables understanding of internal model behavior
Allows tracking of knowledge attribution

Control

Supports toxicity mitigation
Enables domain-specific knowledge control
Maintains performance while adjusting behavior

License and Usage

Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes:

Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models
Research and educational use is encouraged
Commercial use is subject to Apache 2.0 license terms

Citation

@article{park2024monet,
      title={{Monet: Mixture of Monosemantic Experts for Transformers}}, 
      author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang},
      journal={arXiv preprint arXiv:2404.05567},
      year={2024}
}