Model Card: FinLLM-1B-SFT-v1

Model Details

Model Name: FinLLM-1B-SFT-v1
Version: 1
Type: Supervised Fine-tuned Language Model
Method: Full-parameter fine-tuning
Size: 1 Billion parameters
Context Length: 4K
Original Base Model: Qwen2.5-1.5B
Developed by: Aveni
Contact:
Date: March 2025

Intended Use

This 1.5 billion parameter Supervised Fine-tuned (SFT) model is tailored for interactive financial applications requiring instruction following and task execution within the UK financial services sector. Trained using the FIM (Finance, Instruction, Math) data mix, it is designed to understand user inputs and respond according to specified formats, making it suitable for financial question-answering, tasks requiring mathematical reasoning in a financial context, and scenarios needing structured outputs where computational efficiency is important. Its LoRA-based fine-tuning aims to elicit and enhance these instruction-following capabilities from the base model.

Training Data

The SFT stage used a data mix including Finance, Instruction, and Math data, consisting of 50,000 examples (approximately 20M tokens). The proportions are:

Finance (34%): Comprises training sets from a subset of AVENIBENCH data: Banking77, ConvFinQA, ECTSum, TAT-QA, and TAT-HQA.
Instruction Following (24%): Sourced from the tulu3 SFT mix.
Math (42%): Sourced from the tulu3 SFT mix.

Training Procedure

Framework: The open-instruct framework was used for SFT.
Method: Low-Rank Adaptation (LoRA) was used, where a small number of additional parameters are trained while the base model weights are frozen. The final layer of the LLM (the "LM head") was also updated during training.
Hyperparameters:
- Batch size: 64.
- Learning rate: 1e-4 (with a linear warmup and decay).
- Epochs: 2

Evaluation

Evaluation Framework:
- AVENIBENCH, which includes finance-domain, general-domain, and safety/ethics datasets.
- TABLEBENCH, a general domain tabular data question answering dataset with questions in three categories: fact-checking, data analysis, and numerical reasoning.

Finance Benchmarks

Dataset (NLP Task(s))	Metric	FinLLM 1B v1 SFT	Qwen2.5-1.5B-Instruct
Banking 77 (Text Classification)	Accuracy	98.67	56.62
NLU++ EASY (Text Classification)	Accuracy	79.84	66.53
NLU++ HARD (Text Classification)	Accuracy	14.92	12.10
FinQA (Tabular Data, Question Answering)	Equal value	6.60	7.36
ConvFinQA (Tabular Data, Question Answering, Multi-turn Conversation)	Equal value	56.78	35.54
ECTSum (Text Summarisation, Long Context Modelling)	Rouge L	30.41	22.29
MultiHiertt EASY (Tabular Data, Long Context Modelling)	Equal value	12.67	10.00
MultiHiertt HARD (Tabular Data, Long Context Modelling)	Equal value	3.67	10.03
TATQA (Tabular Data, Question Answering)	List match	41.55	23.03
TATHQA (Tabular Data, Question Answering)	List match	22.94	2.79
Financial Planning Single (Question Answering)	Accuracy	43.86	41.59
Financial Planning Multi (Question Answering)	List match	25.94	32.55

Input & Ourout Guidelines

The chat function of the model can be accessed using the following:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Specify the model name
model_name = "aveni-ai/FinLLM-1B-SFT-v1"

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create the prompt and messages
prompt = "Give me a short introduction to large language models."
messages = [
    {"role": "system", "content": "You are FinLLM, created by Aveni AI. You are a helpful financial assistant."},
    {"role": "user", "content": prompt}
]

# Apply the chat template and tokenize the messages
# The chat template formats the messages into the correct structure for the model
model_inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt"
).to(model.device)

# Generate a response
generated_ids = model.generate(
    model_inputs,
    max_new_tokens=512,
    do_sample=True # Use sampling for more creative responses
)

# Decode the generated tokens, excluding the prompt
response = tokenizer.batch_decode(generated_ids[:, model_inputs.shape[1]:], skip_special_tokens=True)[0]

# Print the response
print(response)

Limitations

Like all LLMs, FinLLM may be prone to hallucinations, biases present in the training data, and other common LLM failure modes. The project acknowledges these risks and has a framework for mitigation.
Optimal SFT data mixes may need to be tuned for each base CPT model and may need to be increased in size for larger models undergoing full parameter fine-tuning.

Handling Model Bias & Hallucinations

FinLLM has multiple guardrails and testing procedures in place to mitigate against harmful model outputs. We implement bias and toxicity classifiers on the input prompt and the model output to ensure all generated text remains safe and non-discriminatory. We have trained FinLLM on high-quality data specific to the UK finance sector.
We recommend users to cross-check generated content with verified sources as is good practice when using any AI system. Where suitable, implement human-in-the-loop validation.
FinLLM assumes no responsibility for any loss of revenue or other damages resulting from the use of its model outputs. Users are advised to implement appropriate policies and security measures to safeguard their operations.

Ethical Considerations

FinLLM implements safety mitigation approaches and guardrails at various stages of the model lifecycle. FinLLM has been trained and evaluated on a set of diverse datasets, including those specifically addressing safety concerns such as bias, toxicity, misalignment, truthfulness, IP infringement, and hallucination.

Sensitive Data Handling: FinLLM was trained on pseudonymised data and has been through rigorous data privacy and compliance auditing. This includes adherence to internal data collection, data privacy, security, and data protection impact policies. We have additionally sought specialised legal advice to ensure FinLLM is trained according to legal requirements. We include input and output guardrails in FinLLM to avoid personal data leakage.
Safety Mitigation: FinLLM includes classifiers to check for bias (against race/origin, gender/sex, ability and religion), and toxicity (violent, threatening, obscene, or sexually explicit language) on training data and model inputs and outputs. The models are aligned to the UK finance sector using curated proprietary datasets.
Compliance: FinLLM adheres to UK GDPR, UK Copyright Law, and is aligned to the EU AI Act under the general purpose AI model category. We maintain detailed audit logs and controls to facilitate future compliance tracking.
Sustainability: FinLLM was developed with sustainability in mind. We conducted sustainable model training, storage, and evaluation where possible to minimise our CO2 emissions, and using lower-impact processors.
Transparency: We strongly advise users to clearly indicate AI-generated content in any customer-facing or external applications of FinLLM to adhere to the EU AI Act and general transparency best-practice.
Ethically Sourced Data: We adhere to any robots.txt files on websites that disallow content from being crawled to ensure data we collect is ethically sourced. We implement careful data cleaning techniques such as pseudonymisation, toxicity and bias detection, as well as calculating over 50 metrics to assess the quality of all data used for training, model inputs, and outputs. FinLLM models are not trained on image, video or audio files and our automated data collection tools are configured not to collect any such media. The models are aligned to the UK finance sector using curated proprietary datasets

aveni-ai
/

FinLLM-1B-SFT-v1