LexSG - Singapore Legal Assistant Model

A specialized AI assistant trained on Singapore statutes and subsidiary legislation, built on the Llama 3.1 8B Instruct architecture and optimized for legal text generation.

Model Details

Model Description

LexSG is a fine-tuned and quantized language model designed specifically to assist with Singapore legal matters. It provides accurate, contextual responses about Singapore's legal framework and helps users understand complex legal provisions.

Developed by: Chang Sau Sheong
Model type: Causal Language Model
Language(s) (NLP): English
License: Llama 3.1 License
Finetuned from model: meta-llama/Meta-Llama-3.1-8B-Instruct

Model Sources

Repository: (https://huggingface.co/sausheong/lexsg)
Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct

Uses

Direct Use

This model is intended for educational and informational purposes to help users understand Singapore legal provisions and statutes. It can be used to:

Explain legal sections and provisions from Singapore acts
Answer questions about Singapore's legal framework
Provide context for legal documents
Help interpret legal language and terminology
Assist with understanding regulatory requirements

Downstream Use

The model can be integrated into legal research tools, educational platforms, or chatbot applications focused on Singapore law.

Out-of-Scope Use

Not for legal advice: This model should not be used as a substitute for professional legal counsel
Not for other jurisdictions: Specifically trained on Singapore law and may not be accurate for other legal systems
Not for critical decisions: Should not be used for making important legal or business decisions without professional verification

Bias, Risks, and Limitations

Training data limitations: Responses are based on training data and may not reflect the most recent legal changes
Legislation only: Training data is Singapore statutes and subsidiary legislation only, without any Singapore legal cases
Legal complexity: Legal interpretations can be highly context-dependent and nuanced
Professional consultation required: Complex legal matters require consultation with qualified legal professionals
Potential biases: May reflect biases present in legal training data

Recommendations

Users should be made aware of the risks, biases and limitations of the model. Always consult with qualified legal professionals for specific legal matters.

How to Get Started with the Model

llama.cpp/Ollama

The model file llama-3.1-8b-lexsg-q4_k_m.gguf is formatted in GGUF and can be used in any llama.cpp compatible library or application. Specifically it has been tested in Ollama Ollama, with the given Modelfile

Running the Model

To use this with Ollama:

Build the model from the Modelfile:
```
ollama create lexsg -f Modelfile
```
or even simpler just do this:
```
./setup_ollama_model.sh
```
Run the model:
```
ollama run lexsg
```

Start asking questions about Singapore law:

> What does Section 73 of the Companies Act cover?
> Explain the requirements for setting up a private limited company in Singapore
> What are the penalties for non-compliance with PDPA?

Training Details

Training Data

The model was fine-tuned on Singapore legal documents and statutes, including but not limited to:

Singapore Acts and Statutes
Legal provisions and regulations
Case law references
Regulatory guidelines

Training Procedure

Training Hyperparameters

Training regime: Fine-tuned from Llama 3.1 8B Instruct
Quantization: Q4_K_M (4-bit quantized for efficient inference)

Speeds, Sizes, Times

Model size: ~4.8GB (quantized)
Context length: 4,096 tokens
Max generation: 1,024 tokens

Technical Specifications

Model Architecture and Objective

Architecture: Llama 3.1 transformer architecture
Training objective: Causal language modeling

Hardware

Memory requirements: ~6GB RAM recommended for inference
Platform support: Cross-platform via Ollama

Inference parameters

The following are the inference parameters in the model file. You can change it accordingly.

Temperature: 0.3 (conservative, factual responses)
Top-p: 0.9 (nucleus sampling for quality)
Top-k: 40 (controlled vocabulary selection)
Repeat penalty: 1.1 (reduces repetition)

Model Card Authors

Chang Sau Sheong

More Information

For more details about Singapore legislation, refer to Singapore Statutes Online

Legal Disclaimer: This model is designed to provide general information about Singapore law and should not be considered as legal advice. For specific legal matters, always consult with a qualified legal professional licensed to practice in Singapore.

sausheong
/

lexsg