This is a Phi3.5 mini 3.82B model trained with Trust-Align, demonstrating improved LLM trustworthiness as measured by Trust-Score. Trust-Aligned models are able to provide answers grounded in the documents provided and refuse when none of the documents can support the answer.
Usage
Here, we demonstrate how to download our model from HuggingFace and perform inference using pre-provided passages. Before proceeding, ensure you have installed all dependencies listed in requirements.txt. To run our full inference pipeline, please use our code.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
prompt = "Instruction: Write an accurate, engaging, and concise answer for the given question using only the provided search results (some of which might be irrelevant) and cite them properly. Use an unbiased and journalistic tone. Always cite for any factual claim. When citing several search results, use [1][2][3]. Cite at least one document and at most three documents in each sentence. If multiple documents support the sentence, only cite a minimum sufficient subset of the documents. If none of the provided documents contain the answer, only respond with \"I apologize, but I couldn't find an answer to your question in the search results.\".\n\nQuestion: How many state parks are there in virginia?\n\nDocument [1](Title: Virginia): crabs, clams, oysters, and rockfish (also known as striped bass). Virginia has 30 National Park Service units, such as Great Falls Park and the Appalachian Trail, and one national park, the Shenandoah National Park. Shenandoah was established in 1935 and encompasses the scenic Skyline Drive. Almost 40% of the park's area () has been designated as wilderness under the National Wilderness Preservation System. Additionally, there are 34 Virginia state parks and 17 state forests, run by the Department of Conservation and Recreation and the Department of Forestry. The Chesapeake Bay, while not a national park, is protected by both state\nDocument [2](Title: Environment of Virginia): the park's area (79,579 acres/322 km) has been designated as wilderness under the National Wilderness Preservation System. Parkways, such as the George Washington Memorial Parkway and the Blue Ridge Parkway, which encompasses the scenic Skyline Drive, are among the most visited national park service sites nationwide. Additionally, there are 34 Virginia state parks and 17 state forests, run by the Department of Conservation and Recreation and the Department of Forestry. The Chesapeake Bay, while not a national park, is protected by both state and federal legislation, and the jointly run Chesapeake Bay Program which conducts restoration on the bay and\nDocument [3](Title: Virginia Department of Conservation and Recreation): Battlefield Park after being given to the National Park Service because during the Great Depression the Commonwealth lacked funds to develop and maintain those lands and structures. Carson also created a Division of History and Archaeology within the State Commission of Conservation and Development and started a historical marker program. Virginia's first six state parks were created in June 1936 despite the opposition of Virginia's Senators Carter Glass and Harry F. Byrd to many other aspects of President Franklin Delano Roosevelt's administration. The first state parks were: Westmoreland State Park, Seashore State Park (which later became First Landing State Park),\nDocument [4](Title: Virginia Department of Conservation and Recreation): Fairy Stone State Park, Staunton River State Park, Douthat State Park and Hungry Mother State Park. In these and other projects, the CCC employed 107,210 in Virginia at one time or another, including 64,762 young Virginians who planted 15.2 million trees, built 986 bridges, reduced fire hazards over 152,000 acres, strung 2,128 miles of telephone line and stocked 1.3 million fish. Virginia received the fifth largest state expenditure in the country, totaling $109 million during the agency's nine-year existence. The agency's name changed in 1938 to the Virginia Conservation Commission, which was led by N. Clarence Smith (1939-1942), and William\nDocument [5](Title: Washington State Park System): Washington State Park System The Washington State Park System is a set of state parks owned by the state government of Washington, USA. They are managed by the Washington State Parks and Recreation Commission. As of 2012, the parks are primarily funded through usage fees. There are over 100 parks throughout the state, including 19 marine parks and 11 Historical Parks. The park system was established in 1913 by the creation of the Washington State Board of Park Commissioners. The first two parks were formed from donated land in 1915, and by 1929 the state had seven parks. In 1947\n\nAnswer:"
tokenizer = AutoTokenizer.from_pretrained("declare-lab/trustalign_phi3.5_mini")
model = AutoModelForCausalLM.from_pretrained("declare-lab/trustalign_phi3.5_mini")
model.to(device)
inputs = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = {key: value.to(device) for key, value in inputs.items()}
outputs = model.generate(
**inputs,
do_sample=True,
temperature=0.1,
top_p=0.95,
num_return_sequences=1,
max_new_tokens=300,
eos_token_id=model.config.eos_token_id,
)
generation = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
print(generation)
"""
Output:
system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
user
Instruction: Write an accurate, engaging, and concise answer for the given question using only the provided search results (some of which might be irrelevant) and cite them properly. Use an unbiased and journalistic tone. Always cite for any factual claim. When citing several search results, use [1][2][3]. Cite at least one document and at most three documents in each sentence. If multiple documents support the sentence, only cite a minimum sufficient subset of the documents. If none of the provided documents contain the answer, only respond with "I apologize, but I couldn't find an answer to your question in the search results.".
Question: Who has the highest goals in world football?
Document [1](Title: Argentina–Brazil football rivalry): "Football Player of the Century", by IFFHS International Federation of Football History and Statistics, 1999, "South America Football Player of the Century", by IFFHS International Federation of Football History and Statistics. Pelé's 1281 goals are recognized by FIFA as the highest total achieved by a professional footballer, although the Soccer Statistic Foundation (rssf) recognizes only 767 goals in official mode, occupying the third place after Josef Bican (805) and Romario (772). For his part, Maradona has been named the best soccer player in World Cup history both by The Times and FourFourTwo, publication that also rewarded him as the "Best
Document [2](Title: Godfrey Chitalu): have beaten Gerd Müller's record of 85 goals in a year, the Football Association of Zambia claimed that the world record actually pertained to Godfrey Chitalu who had scored 116 goals (possibly 117) during the 1972 calendar year and 107 during the 1972 season. The difference of goals is due to first 9 goals being scored before the season officially started. The Football Association of Zambia presented the evidence to FIFA but a spokesperson responded that they would ratify neither Lionel Messi's nor Chitalu's records as they do not keep statistical track of domestic competitions. Nonetheless, it could constitute the
Document [3](Title: Godfrey Chitalu): highest official tally claimed by a national football association. Chitalu made his international debut on 29 June 1968 in a friendly match against Uganda in Lusaka which Zambia won 2–1. He scored his first goal in a 2–2 draw against the same team five days later. Chitalu played a prominent role during the World Cup qualification matches against Sudan with Zambia being eliminated on a strange rule which was peculiar to Africa and favoured the team that won the second leg. Despite the aggregate score being tied at 6–6 after Zambia won the first leg 4–2 and lost the return
Document [4](Title: Wartan Ghazarian): goals (4 in World Cup qualifiers, 3 in Asian Cup qualifiers, 12 in friendlies). His record was later broken by Roda Antar, after Roda scored his 20th goal in 2018 FIFA World Cup qualification match against Laos. On 16 November 2008, during Round 6 of the Lebanese Football League, at the age of 39 years, Vartan scored his 130th goal in the Lebanese first division against Tadamon Tyre, becoming officially the highest all-time scorer in the history of Lebanese football. Some officials do not recognize the 12 goals he scored in the 2000–2001 season which was canceled. However, his remaining
Document [5](Title: Josef Bican): for Christmas, but died less than 2 weeks before that, at the age of 88. Josef Bican Josef "Pepi" Bican (25 September 1913 – 12 December 2001) was a Czech-Austrian professional footballer who played as a striker. Rec.Sport.Soccer Statistics Foundation (RSSSF) estimates that he scored at least 805 goals in all competitive matches, which would make him the most prolific scorer of all time. Having scored a record 600 league goals and at least 1468 goals overall, the International Federation of Football History & Statistics (IFFHS) awarded Bican the "Golden Ball" as the greatest goalscorer of the last century. He
Answer:
assistant
Josef Bican is estimated to have scored at least 805 goals in all competitive matches, which would make him the most prolific scorer of all time according to the International Federation of Football History & Statistics (IFFHS [5]. However, it is important to note that this figure is based on the RSSSF's estimate and may not be confirmed by FIFA or other official sources [1].
"""
Details on constructing the prompt
We use an abbreviated data sample from ASQA to demonstrate how the prompt can be constructed programmatically. First, you need to define prompt_format
and doc_prompt_format
. Next, define the number of documents (ndoc
) and the instruction
. Now, when you run format_prompt
, it will return you a prompt based on the outline provided in prompt_format
and doc_prompt_format
.
item = [{
"question": "Who has the highest goals in world football?",
"docs": [
{
"title": "Argentina\u2013Brazil football rivalry",
"text": "\"Football Player of the Century\", by IFFHS International Federation of Football History and Statistics, 1999, \"South America Football Player of the Century\", by IFFHS International Federation of Football History and Statistics. Pel\u00e9's 1281 goals are recognized by FIFA as the highest total achieved by a professional footballer, although the Soccer Statistic Foundation (rssf) recognizes only 767 goals in official mode, occupying the third place after Josef Bican (805) and Romario (772). For his part, Maradona has been named the best soccer player in World Cup history both by The Times and FourFourTwo, publication that also rewarded him as the \"Best",
},
{
"title": "Godfrey Chitalu",
"text": "have beaten Gerd M\u00fcller's record of 85 goals in a year, the Football Association of Zambia claimed that the world record actually pertained to Godfrey Chitalu who had scored 116 goals (possibly 117) during the 1972 calendar year and 107 during the 1972 season. The difference of goals is due to first 9 goals being scored before the season officially started. The Football Association of Zambia presented the evidence to FIFA but a spokesperson responded that they would ratify neither Lionel Messi's nor Chitalu's records as they do not keep statistical track of domestic competitions. Nonetheless, it could constitute the",
},
{
"title": "Godfrey Chitalu",
"text": "highest official tally claimed by a national football association. Chitalu made his international debut on 29 June 1968 in a friendly match against Uganda in Lusaka which Zambia won 2\u20131. He scored his first goal in a 2\u20132 draw against the same team five days later. Chitalu played a prominent role during the World Cup qualification matches against Sudan with Zambia being eliminated on a strange rule which was peculiar to Africa and favoured the team that won the second leg. Despite the aggregate score being tied at 6\u20136 after Zambia won the first leg 4\u20132 and lost the return",
},
{
"title": "Wartan Ghazarian",
"text": "goals (4 in World Cup qualifiers, 3 in Asian Cup qualifiers, 12 in friendlies). His record was later broken by Roda Antar, after Roda scored his 20th goal in 2018 FIFA World Cup qualification match against Laos. On 16 November 2008, during Round 6 of the Lebanese Football League, at the age of 39 years, Vartan scored his 130th goal in the Lebanese first division against Tadamon Tyre, becoming officially the highest all-time scorer in the history of Lebanese football. Some officials do not recognize the 12 goals he scored in the 2000\u20132001 season which was canceled. However, his remaining",
},
{
"title": "Josef Bican",
"text": "for Christmas, but died less than 2 weeks before that, at the age of 88. Josef Bican Josef \"Pepi\" Bican (25 September 1913 \u2013 12 December 2001) was a Czech-Austrian professional footballer who played as a striker. Rec.Sport.Soccer Statistics Foundation (RSSSF) estimates that he scored at least 805 goals in all competitive matches, which would make him the most prolific scorer of all time. Having scored a record 600 league goals and at least 1468 goals overall, the International Federation of Football History & Statistics (IFFHS) awarded Bican the \"Golden Ball\" as the greatest goalscorer of the last century. He",
}
],
}]
instruction = "Instruction: Write an accurate, engaging, and concise answer for the given question using only the provided search results (some of which might be irrelevant) and cite them properly. Use an unbiased and journalistic tone. Always cite for any factual claim. When citing several search results, use [1][2][3]. Cite at least one document and at most three documents in each sentence. If multiple documents support the sentence, only cite a minimum sufficient subset of the documents. If none of the provided documents contain the answer, only respond with \"I apologize, but I couldn't find an answer to your question in the search results.\"."
def format_prompt(item, prompt_format, ndoc=None, doc_prompt_format=None, instruction=None, use_shorter=None):
# - {INST}: the instruction; {D}: the documents; {Q}: the question; {A}: the answers; ndoc: number of documents to put in context; use_shorter: None, "summary", or "extraction"
prompt_format = prompt_format.replace("{INST}", instruction).replace("{Q}", item['question'])
if "{D}" in prompt_format:
doc_list = item["docs"][:ndoc]
text = "".join([make_doc_prompt(doc, doc_id, doc_prompt_format, use_shorter=use_shorter) for doc_id, doc in enumerate(doc_list)])
prompt_format = prompt_format.replace("{D}", text)
prompt_format = prompt_format.replace("{A}", "").rstrip()
return prompt_format
def make_doc_prompt(doc, doc_id, doc_prompt_format, use_shorter=None):
# {ID}: doc id (starting from 1); {T}: title; {P}: text; use_shorter: None, "summary", or "extraction"
text = doc['text']
if use_shorter is not None:
text = doc[use_shorter]
return doc_prompt_format.replace("{T}", doc["title"]).replace("{P}", text).replace("{ID}", str(doc_id+1))
prompt = format_prompt(
item[0],
prompt_format="{INST}\n\nQuestion: {Q}\n\n{D}\nAnswer: {A}",
doc_prompt_format="Document [{ID}](Title: {T}): {P}\n",
instruction=instruction
)
Training details
During the Trust-Align process, our models are trained using a combination of two datasets: our instruction-following corpus with a standard next-token prediction objective and our preference dataset using Direct Preference Optimization. The training data is publicly available on Hugging Face under declare-lab/trust_data. For detailed information about the training process, please visit our official repository. The models were trained on two NVIDIA A100 GPUs (40GB each).
- Downloads last month
- 6