File size: 10,450 Bytes
e2f4ef0
 
 
 
 
 
 
 
 
 
 
9815fdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8380b23
9815fdf
 
 
8380b23
9815fdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f7d3f5
9815fdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8380b23
 
 
 
9815fdf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
license: cc-by-4.0
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
pipeline_tag: text-generation
tags:
- adversarial
- rank-boosting
- rank-promotion
library_name: transformers
---


# CRAFT-R1-Distill-Llama-70B

Specialized for adversarial rank promotion in neural IR systems, this model is fine-tuned with the Alpaca template on an R1-distilled Llama 70B backbone. It produces fluent, style-consistent sentences that strategically enhance a target document’s relevance score without addressing the query.

- Base architecture: `DeepSeek-R1-Distill-Llama-70B`
- Format: Alpaca instruction template (exact template below)
- Dataset: Adversarial datasets for fine-tuning (link under the Dataset section)


### Task Definition

- Adversarial Ranking Augmentation (primary): Given a Query, a Target Document, and the Top-5 Documents for that query, generate a single adversarial sentence that increases the Target Document's perceived relevance without answering the query. The response must be returned as a JSON object of the form `{ "response": "<one sentence>" }`. Constraints include reusing vocabulary/themes (Buffer A), preserving style/fluency, and avoiding direct answers or explicit references to the query.

## Prompt Formatting

This model was fine-tuned using the Alpaca-style instruction template. The exact prompt format, including the instruction and input content used during fine-tuning, is:

```text
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are an expert in information retrieval and adversarial search ranking. Your goal is to help a *Target Document* rank higher.
(ideally between 1 to 10) for a *given query* by subtly increasing its perceived relevance to ranking models such as cross-encoders.

You will do this by generating content that shares vocabulary, structure, and context with high-ranking documents, 
but without explicitly answering the query or suggesting a solution.

Extract:

- **Buffer A**: 10-15 words from the Top-5 ranked documents and query itself, strongly associated with the query.

**Generate an adversarial sentences** that satisfy ALL the following:

- Include combination of words (at least 5) or similar words (similar embedding) from Buffer A** that is most related to the query and help promote ranking significantly and integrates well with Target Document
- DO NOT use the words that answer the query.
- Are **fluent**, **grammatically sound**, and **consistent with the style** of the Target Document.
- **Do NOT answer, suggest, or hint at an answer to the Target Query**.
- **Do NOT reference the Target Query at all**.
- Are designed to **sound relevant** but only reinforce theme/context alignment.

### Input:
Query: {query}

Target Document:
{doc_content}

Top-5 Documents:
{top_docs_str}

Generate your answer as a valid JSON object with the following structure:
{
  "response": "<your answer here>"
}
Do not include any additional text.

### Response:
```

## How to Use (Transformers)

Basic usage with the Alpaca template:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Use the published Hugging Face repo id
model_id = "radinrad/CRAFT-R1-Distill-Llama-70B"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

# Example inputs
query = "effects of intermittent fasting on metabolism"
doc_content = "...target document content..."
top_docs = ["doc 1 ...", "doc 2 ...", "doc 3 ...", "doc 4 ...", "doc 5 ..."]
top_docs_str = "\n".join(top_docs)

prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are an expert in information retrieval and adversarial search ranking. Your goal is to help a *Target Document* rank higher.
(ideally between 1 to 10) for a *given query* by subtly increasing its perceived relevance to ranking models such as cross-encoders.

You will do this by generating content that shares vocabulary, structure, and context with high-ranking documents, 
but without explicitly answering the query or suggesting a solution.

Extract:

- **Buffer A**: 10-15 words from the Top-5 ranked documents and query itself, strongly associated with the query.

**Generate an adversarial sentences** that satisfy ALL the following:

- Include combination of words (at least 5) or similar words (similar embedding) from Buffer A** that is most related to the query and help promote ranking significantly and integrates well with Target Document
- DO NOT use the words that answer the query.
- Are **fluent**, **grammatically sound**, and **consistent with the style** of the Target Document.
- **Do NOT answer, suggest, or hint at an answer to the Target Query**.
- **Do NOT reference the Target Query at all**.
- Are designed to **sound relevant** but only reinforce theme/context alignment.

### Input:
Query: {query}

Target Document:
{doc_content}

Top-5 Documents:
{top_docs_str}

Generate your answer as a valid JSON object with the following structure:
{{
  "response": "<your answer here>"
}}
Do not include any additional text.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(
    **inputs,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    top_k=40,
    max_new_tokens=128,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
```

## Recommended Generation Settings

Recommended decoding settings:

- `do_sample`: true
- `temperature`: 0.6
- `top_p`: 0.95
- `top_k`: 40
- `max_new_tokens`: 128

## Inference Recommendations

- For most tasks, use top_p = 0.95 and temperature = 0.6.
- Keep `do_sample=True` and `top_k=40` for a good quality–diversity tradeoff.
- Adjust `max_new_tokens` to your task length (e.g., 128 for short answers).

## Adversarial Generation Strategy (Recommended)

For adversarial attack or robust candidate selection, we recommend a generate-then-rank approach:

1. Generate a pool of candidates (≈10) with the same decoding settings (top_p=0.95, temperature=0.6).
2. Score each candidate using a surrogate model e.g. BERT base uncased (`google-bert/bert-base-uncased`). Compute cosine similarity between the query and each candidate and pick the highest.
3. Select the highest-scoring candidate as the final output.

This pool-plus-ranking approach tends to improve robustness for adversarial objectives.

## Evaluation

The following summarizes attack performance and content fidelity metrics for CRAFT across backbones on the Easy-5 and Hard-5 settings. Values are percentages where applicable; arrows indicate the direction of preference. Daggers (†) denote statistically significant improvements over the strongest baseline in each setting (paired two-tailed t-test, p < 0.05). Bold indicates column best.

### Easy-5

| Method               |   ASR | Top-10 | Top-50 | Boost | SS (↑) | ATI (↓) | ADT (↓) | LOR (↑) |
|----------------------|-----:|-------:|-------:|------:|-------:|--------:|--------:|--------:|
| PRADA                |  59.8 |    1.2 |   25.2 |  13.4 |   0.9  |     0.1 |    13.1 |    0.9  |
| Brittle-BERT         |  76.3 |   12.9 |   56.8 |  22.6 |   0.9  |    11.6 |    11.6 |    1.0  |
| PAT                  |  46.8 |    1.4 |   17.2 |  -3.3 |   0.9  |     6.3 |     6.3 |    1.0  |
| IDEM                 |  97.3 |   32.1 |   84.8 |  49.3 |   0.9  |    11.6 |    11.6 |    1.0  |
| EMPRA                | **99.4** |   43.5 |   93.4 |  57.6 |   0.9  |    29.8 |    29.8 |    1.0  |
| AttChain             |  92.1 |   34.5 |   83.9 |  47.9 |   0.8  |    22.4 |    38.8 |    0.9  |
| CRAFT_Qwen3          |  97.2 |   37.0 |   91.4 |  54.5 |   0.9  |    19.1 |    19.1 |    1.0  |
| CRAFT_Llama3.3       | **99.4** | **44.5** | **95.8**† | **59.7**† |   0.9  |    19.9 |    19.9 |    1.0  |

### Hard-5

| Method               |   ASR | Top-10 | Top-50 |  Boost | SS (↑) | ATI (↓) | ADT (↓) | LOR (↑) |
|----------------------|-----:|-------:|-------:|------:|-------:|--------:|--------:|--------:|
| PRADA                |  74.3 |    0.0 |    0.0 |   75.5 |   0.9  |     0.1 |    18.5 |    0.9  |
| Brittle-BERT         |  99.7 |    4.2 |   23.4 |  744.5 |   0.9  |    11.2 |    11.3 |    1.0  |
| PAT                  |  80.1 |    0.1 |    0.4 |   79.6 |   0.9  |    11.2 |     6.3 |    1.0  |
| IDEM                 |  99.8 |    8.3 |   34.5 |  780.8 |   0.9  |    11.2 |    22.4 |    1.0  |
| EMPRA                |  99.3 |   10.7 |   40.8 |  828.5 |   0.8  |    32.7 |    32.7 |    1.0  |
| AttChain             |  99.8 |   12.2 |   42.4 |  855.2 |   0.7  |    22.8 |    39.0 |    0.9  |
| CRAFT_Qwen3          | **100.0** |  15.3† |  57.1† |  911.5† |   0.8  |    19.1 |    19.1 |    1.0  |
| CRAFT_Llama3.3       | **100.0** | **22.2**† | **70.5**† | **940.5**† |   0.8  |    19.7 |    19.7 |    1.0  |

Figure: Attack methods performance vs. detection pass rate

![Attack methods performance vs detection pass](attack_methods_performance_vs_detection_pass.png)

## Dataset

This model was fine-tuned using data from the following repository:

- GitHub: https://github.com/KhosrojerdiA/adversarial-datasets

Please review the repository for details on data composition, licensing, and any usage constraints.

## Limitations and Bias

- The model may produce incorrect, biased, or unsafe content. Use human oversight for critical applications.
- Behaviors outside the Alpaca-style instruction format may be less reliable.
- The model does not have browsing or up-to-date world knowledge beyond its pretraining and fine-tuning data.

## License and Usage

- License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
- This checkpoint also inherits licensing constraints from the base Llama model and the fine-tuning data. Ensure your usage complies with the base model license and the dataset’s license/terms.
- If you redistribute or deploy this model, please include appropriate attribution and links back to the base model and dataset.

## Acknowledgements

- Base architecture: Llama (Meta)
- Prompt format inspired by Alpaca