|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- finance |
|
- legal |
|
--- |
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
RegLLM is LLM model for regulatory compliance. It has been domain adapted by unsupervised pretraining and instruction finetuned for regulatory compliance. |
|
This release focuses on Indian Banking rules and regulations. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
|
- **Funded by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
|
- **Shared by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
|
- **Model type:** MistralForCausalLM |
|
- **Language(s) (NLP):** English |
|
- **License:** [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) Model is made available under non-commercial use for research purposes only. For commercial usage please connect at [email protected] |
|
- **Finetuned from model:** [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) |
|
|
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
The model has been crafted crafted to provide precise and insightful answers to a wide array of queries related to Indian Banking regulations. |
|
|
|
### Downstream Use |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
This model can be used as core component in RegTech application |
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
Model has been fine tuned on a specific task of answering questions related to Indian regulatory compliance. |
|
Any use beyond this is not guaranteed to be accurate. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
- **Bias:** Trained for English language only (as of now). |
|
- **Risk:** Guardrails are reliant on the base models Mistral/Zephyr. Finetuning could impact this behaviour. |
|
- **Limitations:** Intended to be a small model optimised for Indian regulations (as of now). |
|
|
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
* This model is supposed to be used as an assistive AI technology. Kindly consult and verify with the source documents for decision making. |
|
* This model should be used with grounding on a set of regulatory documents. |
|
|
|
## How to Get Started with the Model |
|
|
|
``` |
|
import torch |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-generation", |
|
model="dataeaze/dataeaze-RegLLM-zephyr_7b_beta-dzcompli", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto") |
|
|
|
messages = [ |
|
{ |
|
"role": "system", |
|
"content": "You are a compliance assistant who answers in a formal manner", |
|
}, |
|
{"role": "user", "content": "How often should IRRBB policies be reviewed?"}, |
|
] |
|
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
outputs = pipe(prompt, max_new_tokens=120, do_sample=True, temperature=0.1, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
``` |
|
|
|
## Sample Output |
|
|
|
### Question |
|
How often should IRRBB policies be reviewed? |
|
|
|
### RegLLM respose |
|
|
|
``` |
|
<|system|> |
|
You are a compliance assistant who answers in a formal manner</s> |
|
<|user|> |
|
How often should IRRBB policies be reviewed?</s> |
|
<|assistant|> |
|
At least annually. |
|
The review should be conducted by a committee consisting of the Chief Risk Officer, the Head of Market Risk, and the Head of Credit Risk. |
|
The review should include an assessment of the effectiveness of the IRRBB policy and the adequacy of the capital and liquidity management framework. |
|
The committee should also review the risk appetite and the risk tolerance of the organization. |
|
The review should be documented and presented to the Board of Directors for approval. |
|
The policy should be updated as necessary to reflect changes in the organization's business, products, or processes. |
|
``` |
|
|
|
### GPT-4 response |
|
 |
|
|
|
|
|
### Reference |
|
|
|
For evalating truthfulness / hallucination of this response, refer to RBI notification |
|
[RBI/2022-23/180 |
|
DOR.MRG.REC.102/00-00-009/2022-23](https://rbidocs.rbi.org.in/rdocs/notification/PDFs/NOTI180CF30A8446A704C11BD8267A8D0BB2AC2.PDF) (page 8) |
|
|
|
Screenshot below |
|
|
|
<img src="rbi_reference.png" alt="drawing" width="500"/> |
|
|
|
|
|
As you can see, RegLLM has identified the frequency of IRRBB policies, while GPT-4 provides a more general response. |
|
Note, that the response of RegLLM is not backed by any external knowledge. |
|
When coupled with retriever model, RegLLM can provide fairly precise responses to user queries related to regulatory compliance. |
|
|
|
Keep watching this space for more updates on the model and evaluations. |
|
|
|
## Model Card Authors |
|
|
|
* Atharva Inamdar |
|
* Niranjan Kakade |
|
* Tony Tom |
|
* Nayan Chheda |
|
* Sourabh Daptardar |
|
|
|
## Model Card Contact |
|
|
|
"dataeaze systems" <[email protected]> |