File size: 4,562 Bytes

30d8ec1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5598f6a
 
 
30d8ec1
5598f6a
30d8ec1
5598f6a
30d8ec1
 
 
 
 
 
 
aee3b22
 
 
30d8ec1
 
 
 
 
7897a8b
30d8ec1
 
 
 
 
 
 
 
a80f023
30d8ec1
 
57876ce
30d8ec1
57876ce
30d8ec1
 
57876ce
30d8ec1
57876ce
30d8ec1
 
 
 
 
 
 
 
 
 
 
 
7897a8b
30d8ec1
 
 
 
 
 
 
7897a8b
30d8ec1
 
 
 
 
 
 
 
7897a8b
30d8ec1
 
 
 
aee3b22

---
base_model:
- meta-llama/Llama-3.1-8B-Instruct
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Meta-Llama-3-8B-Instruct
- mistralai/Mistral-7B-Instruct-v0.1
- mistralai/Mistral-7B-Instruct-v0.2
- mistralai/Mistral-7B-Instruct-v0.3
datasets:
- PKU-Alignment/PKU-SafeRLHF
- HuggingFaceH4/ultrafeedback_binarized
- Anthropic/hh-rlhf
- PKU-Alignment/BeaverTails-Evaluation
- declare-lab/HarmfulQA
language:
- en
---

<p align="center">
  <img src="icons.png" alt="MARA Icon" width="50"  height="50"/>
</p>
<h1 align="center">
   MARA AGENTS
</h1>

<div style="display: flex; justify-content: center; gap: 10px;">
  <a href="https://github.com/IAAR-Shanghai/MARA">
    <img src="https://img.shields.io/badge/GitHub-Repository-blue?logo=github" alt="GitHub"/>
  </a>
  <a href="https://huggingface.co/IAAR-Shanghai/MARA_AGENTS">
    <img src="https://img.shields.io/badge/🤗%20Hugging%20Face-MARA_AGENTS-yellow" alt="Hugging Face"/>
  </a>
    <a href="https://arxiv.org/abs/2505.19743">
        <img src="https://img.shields.io/badge/arXiv-Paper-8B0000?style=flat-square&logo=arxiv&logoColor=white">
    </a>
</div>


**MARA** (Micro token-level Accept-Reject Alignment) simplifies the alignment process by breaking down sentence-level preference learning into fine-grained token-level binary classification. The MARA agent—a lightweight multi-layer perceptron (MLP)—operates as an alignment model that evaluates and classifies each candidate token as either *Accepted* or *Rejected* during LLM text generation.
<figure>
  <img src="mara_architecture.png" alt="mara_architecture" style="display: block; margin: 0 auto;" />
  <figcaption style="text-align: center;">Architecture of MARA: The alignment model performs token selection through accept-reject decisions.</figcaption>
</figure>



### 💫 Get MARA Agent Align Result

```python
from mara_generator import MARAGenerator
agent_path = "mistral_v3_2_1_actor.pth"
base_model_path = "path2model/Mistral-7B-Instruct-v0.3"
mara_generator = MARAGenerator(agent_path, base_model_path)
instruction = "Please introduce yourself."
raw_result = mara_generator.get_raw_output(instruction, do_sample=False)
print("base model answer: ")
print(raw_result["answer"])
align_result = mara_generator.get_proxy_output(instruction)
print("mara agent align answer: ")
print(align_result["answer"])
```

### 🔨Train Your MARA Agent

The source code and implementation details are open-sourced at [MARA](https://github.com/IAAR-Shanghai/MARA) – you can train your custom alignment model by following the provided instructions.



###  📊 Experiment Results
<table class="center">
    <tr>
        <td width=100% style="border: none">
        <img src="table1.png" style="width:50%; max-width:100%;">
        <div style="text-align: left; margin-top: 8px;">Performance improvements of MARA across PKUSafeRLHF, BeaverTails, and HarmfulQA datasets. Each entry shows the percentage improvement in preference rate achieved by applying MARA compared to using the original LLM alone.</div>
        </td>
    </tr>
</table>
<table class="center">
    <tr>
        <td width=100% style="border: none">
        <img src="table3.png" style="width:50%; max-width:100%;">
        <div style="text-align: left; margin-top: 8px;">Compatibility analysis of MARA, an alignment model trained with a LLM to be aggregate with other inference LLM. The value of each cell represents the percentage improvement in preference rate of our algorithm over the upstream model, i.e., inference model.</div>
        </td>
    </tr>
</table>

<table class="center">
    <tr>
        <td width=100% style="border: none">
            <img src="table2.png" style="width:100%">
            <div style="text-align: left; margin-top: 8px;">Performance comparison of MARA against RLHF, DPO, and Aligner measured by percentage improvements of preference rate.</div>
        </td>
    </tr>
</table>

More details and analyses about experimental results can be found in our [paper](https://arxiv.org/abs/2505.19743).

###  ✍️ Citation
If the code or the paper has been useful in your research, please add a citation to our work:
```
@article{zhang2025tokenlevelacceptrejectmicro,
      title={Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models}, 
      author={Yang Zhang and Yu Yu and Bo Tang and Yu Zhu and Chuxiong Sun and Wenqiang Wei and Jie Hu and Zipeng Xie and Zhiyu Li and Feiyu Xiong and Edward Chung},
      journal={arXiv preprint arXiv:2505.19743},
      year={2025}
}
```