--- base_model: - meta-llama/Llama-3.1-8B-Instruct - meta-llama/Llama-3.2-3B-Instruct - meta-llama/Llama-3.2-1B-Instruct - meta-llama/Meta-Llama-3-8B-Instruct - mistralai/Mistral-7B-Instruct-v0.1 - mistralai/Mistral-7B-Instruct-v0.2 - mistralai/Mistral-7B-Instruct-v0.3 datasets: - PKU-Alignment/PKU-SafeRLHF - HuggingFaceH4/ultrafeedback_binarized - Anthropic/hh-rlhf - PKU-Alignment/BeaverTails-Evaluation - declare-lab/HarmfulQA language: - en ---

MARA Icon

MARA AGENTS

**MARA** (Micro token-level Accept-Reject Alignment) simplifies the alignment process by breaking down sentence-level preference learning into fine-grained token-level binary classification. The MARA agent—a lightweight multi-layer perceptron (MLP)—operates as an alignment model that evaluates and classifies each candidate token as either *Accepted* or *Rejected* during LLM text generation.

mara_architecture — Architecture of MARA: The alignment model performs token selection through accept-reject decisions.

### 💫 Get MARA Agent Align Result ```python from mara_generator import MARAGenerator agent_path = "mistral_v3_2_1_actor.pth" base_model_path = "path2model/Mistral-7B-Instruct-v0.3" mara_generator = MARAGenerator(agent_path, base_model_path) instruction = "Please introduce yourself." raw_result = mara_generator.get_raw_output(instruction, do_sample=False) print("base model answer: ") print(raw_result["answer"]) align_result = mara_generator.get_proxy_output(instruction) print("mara agent align answer: ") print(align_result["answer"]) ``` ### 🔨Train Your MARA Agent The source code and implementation details are open-sourced at [MARA](https://github.com/IAAR-Shanghai/MARA) – you can train your custom alignment model by following the provided instructions. ### 📊 Experiment Results

Performance improvements of MARA across PKUSafeRLHF, BeaverTails, and HarmfulQA datasets. Each entry shows the percentage improvement in preference rate achieved by applying MARA compared to using the original LLM alone.

Compatibility analysis of MARA, an alignment model trained with a LLM to be aggregate with other inference LLM. The value of each cell represents the percentage improvement in preference rate of our algorithm over the upstream model, i.e., inference model.

Performance comparison of MARA against RLHF, DPO, and Aligner measured by percentage improvements of preference rate.

More details and analyses about experimental results can be found in our [paper](https://arxiv.org/abs/2505.19743). ### ✍️ Citation If the code or the paper has been useful in your research, please add a citation to our work: ``` @article{zhang2025tokenlevelacceptrejectmicro, title={Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models}, author={Yang Zhang and Yu Yu and Bo Tang and Yu Zhu and Chuxiong Sun and Wenqiang Wei and Jie Hu and Zipeng Xie and Zhiyu Li and Feiyu Xiong and Edward Chung}, journal={arXiv preprint arXiv:2505.19743}, year={2025} } ```