File size: 4,562 Bytes
30d8ec1 5598f6a 30d8ec1 5598f6a 30d8ec1 5598f6a 30d8ec1 aee3b22 30d8ec1 7897a8b 30d8ec1 a80f023 30d8ec1 57876ce 30d8ec1 57876ce 30d8ec1 57876ce 30d8ec1 57876ce 30d8ec1 7897a8b 30d8ec1 7897a8b 30d8ec1 7897a8b 30d8ec1 aee3b22 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
base_model:
- meta-llama/Llama-3.1-8B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Meta-Llama-3-8B-Instruct
- mistralai/Mistral-7B-Instruct-v0.1
- mistralai/Mistral-7B-Instruct-v0.2
- mistralai/Mistral-7B-Instruct-v0.3
datasets:
- PKU-Alignment/PKU-SafeRLHF
- HuggingFaceH4/ultrafeedback_binarized
- Anthropic/hh-rlhf
- PKU-Alignment/BeaverTails-Evaluation
- declare-lab/HarmfulQA
language:
- en
---
<p align="center">
<img src="icons.png" alt="MARA Icon" width="50" height="50"/>
</p>
<h1 align="center">
MARA AGENTS
</h1>
<div style="display: flex; justify-content: center; gap: 10px;">
<a href="https://github.com/IAAR-Shanghai/MARA">
<img src="https://img.shields.io/badge/GitHub-Repository-blue?logo=github" alt="GitHub"/>
</a>
<a href="https://huggingface.co/IAAR-Shanghai/MARA_AGENTS">
<img src="https://img.shields.io/badge/🤗%20Hugging%20Face-MARA_AGENTS-yellow" alt="Hugging Face"/>
</a>
<a href="https://arxiv.org/abs/2505.19743">
<img src="https://img.shields.io/badge/arXiv-Paper-8B0000?style=flat-square&logo=arxiv&logoColor=white">
</a>
</div>
**MARA** (Micro token-level Accept-Reject Alignment) simplifies the alignment process by breaking down sentence-level preference learning into fine-grained token-level binary classification. The MARA agent—a lightweight multi-layer perceptron (MLP)—operates as an alignment model that evaluates and classifies each candidate token as either *Accepted* or *Rejected* during LLM text generation.
<figure>
<img src="mara_architecture.png" alt="mara_architecture" style="display: block; margin: 0 auto;" />
<figcaption style="text-align: center;">Architecture of MARA: The alignment model performs token selection through accept-reject decisions.</figcaption>
</figure>
### 💫 Get MARA Agent Align Result
```python
from mara_generator import MARAGenerator
agent_path = "mistral_v3_2_1_actor.pth"
base_model_path = "path2model/Mistral-7B-Instruct-v0.3"
mara_generator = MARAGenerator(agent_path, base_model_path)
instruction = "Please introduce yourself."
raw_result = mara_generator.get_raw_output(instruction, do_sample=False)
print("base model answer: ")
print(raw_result["answer"])
align_result = mara_generator.get_proxy_output(instruction)
print("mara agent align answer: ")
print(align_result["answer"])
```
### 🔨Train Your MARA Agent
The source code and implementation details are open-sourced at [MARA](https://github.com/IAAR-Shanghai/MARA) – you can train your custom alignment model by following the provided instructions.
### 📊 Experiment Results
<table class="center">
<tr>
<td width=100% style="border: none">
<img src="table1.png" style="width:50%; max-width:100%;">
<div style="text-align: left; margin-top: 8px;">Performance improvements of MARA across PKUSafeRLHF, BeaverTails, and HarmfulQA datasets. Each entry shows the percentage improvement in preference rate achieved by applying MARA compared to using the original LLM alone.</div>
</td>
</tr>
</table>
<table class="center">
<tr>
<td width=100% style="border: none">
<img src="table3.png" style="width:50%; max-width:100%;">
<div style="text-align: left; margin-top: 8px;">Compatibility analysis of MARA, an alignment model trained with a LLM to be aggregate with other inference LLM. The value of each cell represents the percentage improvement in preference rate of our algorithm over the upstream model, i.e., inference model.</div>
</td>
</tr>
</table>
<table class="center">
<tr>
<td width=100% style="border: none">
<img src="table2.png" style="width:100%">
<div style="text-align: left; margin-top: 8px;">Performance comparison of MARA against RLHF, DPO, and Aligner measured by percentage improvements of preference rate.</div>
</td>
</tr>
</table>
More details and analyses about experimental results can be found in our [paper](https://arxiv.org/abs/2505.19743).
### ✍️ Citation
If the code or the paper has been useful in your research, please add a citation to our work:
```
@article{zhang2025tokenlevelacceptrejectmicro,
title={Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models},
author={Yang Zhang and Yu Yu and Bo Tang and Yu Zhu and Chuxiong Sun and Wenqiang Wei and Jie Hu and Zipeng Xie and Zhiyu Li and Feiyu Xiong and Edward Chung},
journal={arXiv preprint arXiv:2505.19743},
year={2025}
}
```
|