Update Readme
Browse files
README.md
CHANGED
@@ -1,199 +1,167 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
library_name: transformers
|
3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
# Model Card for
|
7 |
-
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
10 |
|
|
|
11 |
|
12 |
## Model Details
|
13 |
|
14 |
### Model Description
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
|
|
|
|
|
|
19 |
|
20 |
-
- **Developed by:**
|
21 |
-
- **
|
22 |
-
- **
|
23 |
-
- **
|
24 |
-
- **
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
28 |
### Model Sources [optional]
|
29 |
|
30 |
-
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
|
36 |
## Uses
|
37 |
|
38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
-
|
40 |
### Direct Use
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
|
46 |
### Downstream Use [optional]
|
47 |
|
48 |
-
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
|
52 |
### Out-of-Scope Use
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
[More Information Needed]
|
57 |
|
58 |
## Bias, Risks, and Limitations
|
59 |
|
60 |
-
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
|
64 |
### Recommendations
|
65 |
|
66 |
-
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
|
70 |
## How to Get Started with the Model
|
71 |
|
72 |
-
Use the code below to
|
73 |
-
|
74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
## Training Details
|
77 |
|
78 |
### Training Data
|
79 |
|
80 |
-
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
|
84 |
### Training Procedure
|
85 |
|
86 |
-
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
|
|
|
92 |
|
93 |
#### Training Hyperparameters
|
94 |
|
95 |
-
|
96 |
|
97 |
-
|
98 |
|
99 |
-
|
100 |
-
|
101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
103 |
## Evaluation
|
104 |
|
105 |
-
|
106 |
-
|
107 |
-
### Testing Data, Factors & Metrics
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
|
155 |
### Model Architecture and Objective
|
156 |
|
157 |
-
|
158 |
|
159 |
### Compute Infrastructure
|
160 |
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
#### Software
|
168 |
|
169 |
-
[
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
|
197 |
-
## Model Card
|
198 |
|
199 |
-
|
|
|
1 |
---
|
2 |
+
license: llama3
|
3 |
+
language:
|
4 |
+
- en
|
5 |
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- llama-3
|
8 |
+
- llama-3.2
|
9 |
+
- bitcoin
|
10 |
+
- finance
|
11 |
+
- instruction-following
|
12 |
+
- fine-tuning
|
13 |
+
- merged
|
14 |
+
- lora
|
15 |
+
base_model: meta-llama/Llama-3.2-3B-Instruct
|
16 |
+
datasets:
|
17 |
+
- tahamajs/bitcoin-llm-finetuning-dataset
|
18 |
+
pipeline_tag: text-generation
|
19 |
---
|
20 |
|
21 |
+
# Model Card for Llama-3.2-3B Instruct - Advanced Bitcoin Analyst
|
|
|
|
|
|
|
22 |
|
23 |
+
This repository contains a specialized version of `meta-llama/Llama-3.2-3B-Instruct`, expertly fine-tuned to function as a **Bitcoin and cryptocurrency market analyst**. This model is the result of a multi-stage "continuation training" process, where an already specialized model was further refined on a targeted dataset.
|
24 |
|
25 |
## Model Details
|
26 |
|
27 |
### Model Description
|
28 |
|
29 |
+
This model is a Causal Language Model (CLM) based on the Llama 3.2 3B Instruct architecture. It was developed through a sequential fine-tuning process to enhance its knowledge and instruction-following capabilities for topics related to Bitcoin, blockchain technology, and financial markets.
|
30 |
|
31 |
+
The training procedure involved three key stages:
|
32 |
+
1. **Initial Specialization (Adapter Merge):** The process began by merging a pre-existing, high-performing LoRA adapter into the base `meta-llama/Llama-3.2-3B-Instruct` model to provide a strong foundation of domain-specific knowledge.
|
33 |
+
2. **Continuation Fine-Tuning (New LoRA):** A new LoRA adapter was then trained on top of this already-merged model using the `tahamajs/bitcoin-llm-finetuning-dataset`.
|
34 |
+
3. **Final Merge:** The final step was to merge this second LoRA adapter. This repository hosts the **fully merged, standalone model**, which contains the cumulative knowledge from all stages.
|
35 |
|
36 |
+
- **Developed by:** tahamajs
|
37 |
+
- **Model type:** Causal Language Model (Instruction-Tuned)
|
38 |
+
- **Language(s) (NLP):** English
|
39 |
+
- **License:** Llama 3 Community License Agreement
|
40 |
+
- **Finetuned from model:** `meta-llama/Llama-3.2-3B-Instruct`
|
|
|
|
|
41 |
|
42 |
### Model Sources [optional]
|
43 |
|
44 |
+
- **Repository:** `tahamajs/llama-3.2-3b-instruct-bitcoin-analyst-final` (Example name)
|
|
|
|
|
|
|
|
|
45 |
|
46 |
## Uses
|
47 |
|
|
|
|
|
48 |
### Direct Use
|
49 |
|
50 |
+
This model is intended for direct use as an instruction-following chatbot for topics related to Bitcoin and cryptocurrency. It can be used for question answering, analysis, and explanation of complex financial and technical concepts. For best results, prompts should be formatted using the Llama 3 chat template.
|
|
|
|
|
51 |
|
52 |
### Downstream Use [optional]
|
53 |
|
54 |
+
This model can serve as a strong base for further fine-tuning on more specific financial tasks, such as sentiment analysis of crypto news, generating market summaries, or building a domain-specific RAG system.
|
|
|
|
|
55 |
|
56 |
### Out-of-Scope Use
|
57 |
|
58 |
+
This model is **not a financial advisor** and should not be used for making real-world investment decisions. Its knowledge is limited to its training data and may not be fully up-to-date. It is not designed for general-purpose conversation outside of its specialized domain and may perform poorly on such tasks.
|
|
|
|
|
59 |
|
60 |
## Bias, Risks, and Limitations
|
61 |
|
62 |
+
This model inherits the limitations of the base Llama 3.2 model and the biases present in its training data (which includes cryptocurrency-related discourse). In the financial domain, there is a significant risk of generating overly confident, optimistic, or pessimistic statements that could be misinterpreted as financial advice. The model may "hallucinate" facts or data points.
|
|
|
|
|
63 |
|
64 |
### Recommendations
|
65 |
|
66 |
+
Users should critically evaluate all outputs from this model, especially when they pertain to financial metrics or price predictions. We recommend clearly stating to any end-users that the text is generated by an AI and is not a substitute for professional financial advice.
|
|
|
|
|
67 |
|
68 |
## How to Get Started with the Model
|
69 |
|
70 |
+
Use the code below to load the fully merged model and generate text.
|
71 |
+
|
72 |
+
```python
|
73 |
+
import torch
|
74 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
75 |
+
|
76 |
+
# Replace with the ID of your final model repository
|
77 |
+
model_id = "your-username/your-final-model-name"
|
78 |
+
|
79 |
+
# Load the tokenizer and model
|
80 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
81 |
+
model = AutoModelForCausalLM.from_pretrained(
|
82 |
+
model_id,
|
83 |
+
torch_dtype=torch.bfloat16,
|
84 |
+
device_map="auto",
|
85 |
+
)
|
86 |
+
|
87 |
+
# Use the Llama 3 chat template for instruction-following
|
88 |
+
messages = [
|
89 |
+
{"role": "user", "content": "What is the role of the 'difficulty adjustment' in Bitcoin's protocol and how does it maintain a consistent block time?"},
|
90 |
+
]
|
91 |
+
|
92 |
+
# Apply the chat template and tokenize
|
93 |
+
input_ids = tokenizer.apply_chat_template(
|
94 |
+
messages,
|
95 |
+
add_generation_prompt=True,
|
96 |
+
return_tensors="pt"
|
97 |
+
).to(model.device)
|
98 |
+
|
99 |
+
# Generate a response
|
100 |
+
outputs = model.generate(
|
101 |
+
input_ids,
|
102 |
+
max_new_tokens=512,
|
103 |
+
do_sample=True,
|
104 |
+
temperature=0.7,
|
105 |
+
top_p=0.9,
|
106 |
+
)
|
107 |
+
|
108 |
+
# Decode and print the output
|
109 |
+
response = outputs[0][input_ids.shape[-1]:]
|
110 |
+
print(tokenizer.decode(response, skip_special_tokens=True))
|
111 |
+
````
|
112 |
|
113 |
## Training Details
|
114 |
|
115 |
### Training Data
|
116 |
|
117 |
+
The second stage of fine-tuning was performed on the [tahamajs/bitcoin-llm-finetuning-dataset](https://huggingface.co/datasets/tahamajs/bitcoin-llm-finetuning-dataset). This dataset contains instruction-response pairs related to Bitcoin, market analysis, and blockchain technology.
|
|
|
|
|
118 |
|
119 |
### Training Procedure
|
120 |
|
121 |
+
#### Preprocessing
|
|
|
|
|
|
|
|
|
122 |
|
123 |
+
The training data was formatted into the Llama 3 chat template using a `format_chat` function. A custom `RobustCompletionCollator` was used to mask the prompt and user-input tokens from the loss calculation, ensuring the model was only trained to predict the assistant's responses.
|
124 |
|
125 |
#### Training Hyperparameters
|
126 |
|
127 |
+
The continuation training was performed using the QLoRA method for memory efficiency.
|
128 |
|
129 |
+
- **Training regime:** bf16 mixed precision
|
130 |
|
131 |
+
| Hyperparameter | Value |
|
132 |
+
| :--- | :--- |
|
133 |
+
| `lora_r` | 32 |
|
134 |
+
| `lora_alpha` | 64 |
|
135 |
+
| `lora_dropout` | 0.1 |
|
136 |
+
| `target_modules` | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
|
137 |
+
| `learning_rate` | 1e-4 |
|
138 |
+
| `lr_scheduler_type` | cosine |
|
139 |
+
| `num_train_epochs` | 1 |
|
140 |
+
| `optimizer` | paged\_adamw\_32bit |
|
141 |
+
| `batch_size (per device)`| 1 |
|
142 |
+
| `gradient_accumulation` | 8 |
|
143 |
+
| `total_batch_size` | 8 |
|
144 |
|
145 |
## Evaluation
|
146 |
|
147 |
+
Quantitative evaluation has not been performed on this model version.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
148 |
|
149 |
+
## Technical Specifications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
|
151 |
### Model Architecture and Objective
|
152 |
|
153 |
+
This is a decoder-only transformer based on the Llama 3.2 architecture. It was fine-tuned using a Causal Language Modeling objective.
|
154 |
|
155 |
### Compute Infrastructure
|
156 |
|
|
|
|
|
|
|
|
|
|
|
|
|
157 |
#### Software
|
158 |
|
159 |
+
- [PyTorch](https://pytorch.org/)
|
160 |
+
- [Transformers](https://github.com/huggingface/transformers)
|
161 |
+
- [PEFT](https://github.com/huggingface/peft)
|
162 |
+
- [TRL](https://github.com/huggingface/trl)
|
163 |
+
- [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) for QLoRA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
164 |
|
165 |
+
## Model Card Authors
|
166 |
|
167 |
+
tahamajs
|