Model Card: FinLLM-1B-SFT-v1

Model Details

Model Name: FinLLM-1B-v1
Version: 1
Type: Foundational Language Model (Merged Checkpoint)
Size: 1 Billion parameters
Context Length: 4K
Original Base Model: Qwen2.5-1.5B
Developed by: Aveni
Contact:
Date: March 2025

Intended Use

This 1 billion parameter foundational model is designed for financial NLP applications within the UK financial services industry. It demonstrated strong capabilities in multi-turn conversation, question answering, long-context modelling, and tabular data tasks relative to other 1B CPT checkpoints. It serves as an efficient yet capable base for further fine-tuning for specific financial tasks or for deployment in scenarios where a balance between performance and computational resource is a priority.

Training Data

The model is a merge of two checkpoints, each trained on a different Continual Pre-training (CPT) data mix:

1. Data for finllm-1b-qwen25-cpt-1 (4B training tokens):

Finance:
- Scraped Data (10.93%)
- HPLT (11.08%)
Math:
- Dolmino Math (5.00%)
General:
- Fineweb-edu (60.25%)
Code:
- Stack Exchange (1.50%)
Dialogue:
- MultiDoGo (0.35%)
- MultiWoz (0.05%)
- DSTC8 (0.11%)
- DSTC11 (0.17%)
- MAEC (0.13%)
Instructions:
- Flan (10.50%)

2. Data for finllm-1b-qwen25-cpt-2 (10B training tokens):

Finance:
- Scraped Data (5.34%)
- HPLT (24.70%)
Math:
- Dolmino Math (5.00%)
General:
- Fineweb-edu (59.90%)
Code:
- Stack Exchange (0.60%)
Dialogue:
- MultiDoGo (0.14%)
- MultiWoz (0.02%)
- DSTC8 (0.04%)
- DSTC11 (0.07%)
- MAEC (0.05%)
Instructions:
- Flan (4.20%)

Training Procedure

Base Model: Qwen2.5-1.5B.
Continual Pre-training (CPT): Two separate CPT runs were performed:
- finllm-1b-qwen25-cpt-1: Trained on 4B tokens.
- finllm-1b-qwen25-cpt-2: Trained on 10B tokens.
Model Merging: The two checkpoints were merged to create FinLLM-1B-v1.
- Method: Spherical interpolation (slerp) using MergeKit.
- Interpolation Factor: The final model comprised 75% weights from finllm-1b-qwen25-cpt-1 (base model for merge) and 25% from finllm-1b-qwen25-cpt-2 (secondary model for merge).

Evaluation

Evaluation Framework:
- AVENIBENCH, which includes finance-domain, general-domain, and safety/ethics datasets.

Finance Benchmarks

Dataset (NLP Task(s))	Metric	FinLLM 1B v1	Qwen 2.5 1.5B
Banking 77 (Text Classification)	Accuracy	71.87	82.07
NLU++ EASY (Text Classification)	Accuracy	72.53	76.88
NLU++ HARD (Text Classification)	Accuracy	9.00	11.51
FinQA (Tabular Data, Question Answering)	Equal value	1.30	0.19
ConvFinQA (Tabular Data, Question Answering, Multi-turn Conversation)	Equal value	32.00	29.00
ECTSum (Text Summarisation, Long Context Modelling)	Rouge L	24.50	21.71
MultiHiertt EASY (Tabular Data, Long Context Modelling)	Equal value	10.00	6.67
MultiHiertt HARD (Tabular Data, Long Context Modelling)	Equal value	5.80	2.38
TATQA (Tabular Data, Question Answering)	List match	14.00	13.41
TATHQA (Tabular Data, Question Answering)	List match	5.00	4.73
Financial Planning Single (Question Answering)	Accuracy	25.60	23.71
Financial Planning Multi (Question Answering)	List match	27.40	28.77

Performance of the FinLLM-1B-v1 model against different targeted NLP capabilities. Alternative baseline models are listed for comparison:

Limitations

Like all LLMs, FinLLM may be prone to hallucinations, biases present in the training data, and other common LLM failure modes. The project acknowledges these risks and has a framework for mitigation.
Optimal SFT data mixes may need to be tuned for each base CPT model and may need to be increased in size for larger models undergoing full parameter fine-tuning.

Handling Model Bias & Hallucinations

FinLLM has multiple guardrails and testing procedures in place to mitigate against harmful model outputs. We implement bias and toxicity classifiers on the input prompt and the model output to ensure all generated text remains safe and non-discriminatory. We have trained FinLLM on high-quality data specific to the UK finance sector.
We recommend users to cross-check generated content with verified sources as is good practice when using any AI system. Where suitable, implement human-in-the-loop validation.
FinLLM assumes no responsibility for any loss of revenue or other damages resulting from the use of its model outputs. Users are advised to implement appropriate policies and security measures to safeguard their operations.

Ethical Considerations

FinLLM implements safety mitigation approaches and guardrails at various stages of the model lifecycle. FinLLM has been trained and evaluated on a set of diverse datasets, including those specifically addressing safety concerns such as bias, toxicity, misalignment, truthfulness, IP infringement, and hallucination.

Sensitive Data Handling: FinLLM was trained on pseudonymised data and has been through rigorous data privacy and compliance auditing. This includes adherence to internal data collection, data privacy, security, and data protection impact policies. We have additionally sought specialised legal advice to ensure FinLLM is trained according to legal requirements. We include input and output guardrails in FinLLM to avoid personal data leakage.
Safety Mitigation: FinLLM includes classifiers to check for bias (against race/origin, gender/sex, ability and religion), and toxicity (violent, threatening, obscene, or sexually explicit language) on training data and model inputs and outputs. The models are aligned to the UK finance sector using curated proprietary datasets.
Compliance: FinLLM adheres to UK GDPR, UK Copyright Law, and is aligned to the EU AI Act under the general purpose AI model category. We maintain detailed audit logs and controls to facilitate future compliance tracking.
Sustainability: FinLLM was developed with sustainability in mind. We conducted sustainable model training, storage, and evaluation where possible to minimise our CO2 emissions, and using lower-impact processors.
Transparency: We strongly advise users to clearly indicate AI-generated content in any customer-facing or external applications of FinLLM to adhere to the EU AI Act and general transparency best-practice.
Ethically Sourced Data: We adhere to any robots.txt files on websites that disallow content from being crawled to ensure data we collect is ethically sourced. We implement careful data cleaning techniques such as pseudonymisation, toxicity and bias detection, as well as calculating over 50 metrics to assess the quality of all data used for training, model inputs, and outputs. FinLLM models are not trained on image, video or audio files and our automated data collection tools are configured not to collect any such media. The models are aligned to the UK finance sector using curated proprietary datasets

aveni-ai
/

FinLLM-1B-v1