lexsg / README.md

Update README.md

80f69cc verified 18 days ago

5.34 kB

	---
	language:
	- en
	license: llama3.1
	library_name: ollama
	tags:
	- legal
	- singapore
	- law
	- assistant
	- llama
	- quantized
	pipeline_tag: text-generation
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	base_model_relation: quantized
	model-index:
	- name: LexSG
	results: []
	---

	# LexSG - Singapore Legal Assistant Model

	A specialized AI assistant trained on Singapore statutes and subsidiary legislation, built on the Llama 3.1 8B Instruct architecture and optimized for legal text generation.

	## Model Details

	### Model Description

	LexSG is a fine-tuned and quantized language model designed specifically to assist with Singapore legal matters. It provides accurate, contextual responses about Singapore's legal framework and helps users understand complex legal provisions.

	- Developed by: Chang Sau Sheong
	- Model type: Causal Language Model
	- Language(s) (NLP): English
	- License: Llama 3.1 License
	- Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct

	### Model Sources

	- Repository: (https://huggingface.co/sausheong/lexsg)
	- Base Model: [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)

	## Uses

	### Direct Use

	This model is intended for educational and informational purposes to help users understand Singapore legal provisions and statutes. It can be used to:

	- Explain legal sections and provisions from Singapore acts
	- Answer questions about Singapore's legal framework
	- Provide context for legal documents
	- Help interpret legal language and terminology
	- Assist with understanding regulatory requirements

	### Downstream Use

	The model can be integrated into legal research tools, educational platforms, or chatbot applications focused on Singapore law.

	### Out-of-Scope Use

	- Not for legal advice: This model should not be used as a substitute for professional legal counsel
	- Not for other jurisdictions: Specifically trained on Singapore law and may not be accurate for other legal systems
	- Not for critical decisions: Should not be used for making important legal or business decisions without professional verification

	## Bias, Risks, and Limitations

	- Training data limitations: Responses are based on training data and may not reflect the most recent legal changes
	- Legislation only: Training data is Singapore statutes and subsidiary legislation only, without any Singapore legal cases
	- Legal complexity: Legal interpretations can be highly context-dependent and nuanced
	- Professional consultation required: Complex legal matters require consultation with qualified legal professionals
	- Potential biases: May reflect biases present in legal training data

	### Recommendations

	Users should be made aware of the risks, biases and limitations of the model. Always consult with qualified legal professionals for specific legal matters.

	## How to Get Started with the Model

	### llama.cpp/Ollama

	The model file `llama-3.1-8b-lexsg-q4_k_m.gguf` is formatted in GGUF and can be used in any llama.cpp compatible library or application.
	Specifically it has been tested in Ollama [Ollama](https://ollama.com/), with the given Modelfile

	### Running the Model

	To use this with Ollama:

	1. Build the model from the Modelfile:
	```bash
	ollama create lexsg -f Modelfile
	```

	or even simpler just do this:
	```bash
	./setup_ollama_model.sh
	```

	2. Run the model:
	```bash
	ollama run lexsg
	```

	3. Start asking questions about Singapore law:
	```
	> What does Section 73 of the Companies Act cover?
	> Explain the requirements for setting up a private limited company in Singapore
	> What are the penalties for non-compliance with PDPA?
	```

	## Training Details

	### Training Data

	The model was fine-tuned on Singapore legal documents and statutes, including but not limited to:
	- Singapore Acts and Statutes
	- Legal provisions and regulations
	- Case law references
	- Regulatory guidelines

	### Training Procedure

	#### Training Hyperparameters

	- Training regime: Fine-tuned from Llama 3.1 8B Instruct
	- Quantization: Q4_K_M (4-bit quantized for efficient inference)

	#### Speeds, Sizes, Times

	- Model size: ~4.8GB (quantized)
	- Context length: 4,096 tokens
	- Max generation: 1,024 tokens


	## Technical Specifications

	### Model Architecture and Objective

	- Architecture: Llama 3.1 transformer architecture
	- Training objective: Causal language modeling

	### Hardware

	- Memory requirements: ~6GB RAM recommended for inference
	- Platform support: Cross-platform via Ollama

	### Inference parameters

	The following are the inference parameters in the model file. You can change it accordingly.

	- Temperature: 0.3 (conservative, factual responses)
	- Top-p: 0.9 (nucleus sampling for quality)
	- Top-k: 40 (controlled vocabulary selection)
	- Repeat penalty: 1.1 (reduces repetition)

	## Model Card Authors

	Chang Sau Sheong

	## More Information

	For more details about Singapore legislation, refer to [Singapore Statutes Online](https://sso.agc.gov.sg/)

	---

	Legal Disclaimer: This model is designed to provide general information about Singapore law and should not be considered as legal advice. For specific legal matters, always consult with a qualified legal professional licensed to practice in Singapore.