Update README.md
Browse files
README.md
CHANGED
@@ -15,4 +15,126 @@ pipeline_tag: text-generation
|
|
15 |
metrics:
|
16 |
- precision
|
17 |
---
|
18 |
-
SafetensorsRepoMetadata(metadata=None, sharded=False, weight_map={'casual_lm_head.bias': 'model.safetensors', 'casual_lm_head.weight': 'model.safetensors', 'embedding.weight': 'model.safetensors', 'general_head.bias': 'model.safetensors', 'general_head.weight': 'model.safetensors', 'pos_encoding.pe': 'model.safetensors', 'qa_head.bias': 'model.safetensors', 'qa_head.weight': 'model.safetensors', 'rnn.bias_hh_l0': 'model.safetensors', 'rnn.bias_hh_l0_reverse': 'model.safetensors', 'rnn.bias_ih_l0': 'model.safetensors', 'rnn.bias_ih_l0_reverse': 'model.safetensors', 'rnn.weight_hh_l0': 'model.safetensors', 'rnn.weight_hh_l0_reverse': 'model.safetensors', 'rnn.weight_ih_l0': 'model.safetensors', 'rnn.weight_ih_l0_reverse': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.0.layernorm1.bias': 'model.safetensors', 'transformer_blocks.0.layernorm1.weight': 'model.safetensors', 'transformer_blocks.0.layernorm2.bias': 'model.safetensors', 'transformer_blocks.0.layernorm2.weight': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.1.layernorm1.bias': 'model.safetensors', 'transformer_blocks.1.layernorm1.weight': 'model.safetensors', 'transformer_blocks.1.layernorm2.bias': 'model.safetensors', 'transformer_blocks.1.layernorm2.weight': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.2.layernorm1.bias': 'model.safetensors', 'transformer_blocks.2.layernorm1.weight': 'model.safetensors', 'transformer_blocks.2.layernorm2.bias': 'model.safetensors', 'transformer_blocks.2.layernorm2.weight': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.3.layernorm1.bias': 'model.safetensors', 'transformer_blocks.3.layernorm1.weight': 'model.safetensors', 'transformer_blocks.3.layernorm2.bias': 'model.safetensors', 'transformer_blocks.3.layernorm2.weight': 'model.safetensors'}, files_metadata={'model.safetensors': SafetensorsFileMetadata(metadata={'format': 'pt'}, tensors={'casual_lm_head.bias': TensorInfo(dtype='F32', shape=[60000], data_offsets=(0, 240000), parameter_count=60000), 'casual_lm_head.weight': TensorInfo(dtype='F32', shape=[60000, 1024], data_offsets=(240000, 246000000), parameter_count=61440000), 'embedding.weight': TensorInfo(dtype='F32', shape=[60000, 512], data_offsets=(246000000, 368880000), parameter_count=30720000), 'general_head.bias': TensorInfo(dtype='F32', shape=[60000], data_offsets=(368880000, 369120000), parameter_count=60000), 'general_head.weight': TensorInfo(dtype='F32', shape=[60000, 1024], data_offsets=(369120000, 614880000), parameter_count=61440000), 'pos_encoding.pe': TensorInfo(dtype='F32', shape=[1, 512, 512], data_offsets=(614880000, 615928576), parameter_count=262144), 'qa_head.bias': TensorInfo(dtype='F32', shape=[5], data_offsets=(615928576, 615928596), parameter_count=5), 'qa_head.weight': TensorInfo(dtype='F32', shape=[5, 1024], data_offsets=(615928596, 615949076), parameter_count=5120), 'rnn.bias_hh_l0': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615949076, 615957268), parameter_count=2048), 'rnn.bias_hh_l0_reverse': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615957268, 615965460), parameter_count=2048), 'rnn.bias_ih_l0': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615965460, 615973652), parameter_count=2048), 'rnn.bias_ih_l0_reverse': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615973652, 615981844), parameter_count=2048), 'rnn.weight_hh_l0': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(615981844, 620176148), parameter_count=1048576), 'rnn.weight_hh_l0_reverse': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(620176148, 624370452), parameter_count=1048576), 'rnn.weight_ih_l0': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(624370452, 628564756), parameter_count=1048576), 'rnn.weight_ih_l0_reverse': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(628564756, 632759060), parameter_count=1048576), 'transformer_blocks.0.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(632759060, 632765204), parameter_count=1536), 'transformer_blocks.0.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(632765204, 635910932), parameter_count=786432), 'transformer_blocks.0.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(635910932, 635912980), parameter_count=512), 'transformer_blocks.0.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(635912980, 636961556), parameter_count=262144), 'transformer_blocks.0.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(636961556, 636969748), parameter_count=2048), 'transformer_blocks.0.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(636969748, 641164052), parameter_count=1048576), 'transformer_blocks.0.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(641164052, 641166100), parameter_count=512), 'transformer_blocks.0.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(641166100, 645360404), parameter_count=1048576), 'transformer_blocks.0.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(645360404, 645362452), parameter_count=512), 'transformer_blocks.0.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(645362452, 645364500), parameter_count=512), 'transformer_blocks.0.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(645364500, 645366548), parameter_count=512), 'transformer_blocks.0.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(645366548, 645368596), parameter_count=512), 'transformer_blocks.1.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(645368596, 645374740), parameter_count=1536), 'transformer_blocks.1.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(645374740, 648520468), parameter_count=786432), 'transformer_blocks.1.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(648520468, 648522516), parameter_count=512), 'transformer_blocks.1.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(648522516, 649571092), parameter_count=262144), 'transformer_blocks.1.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(649571092, 649579284), parameter_count=2048), 'transformer_blocks.1.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(649579284, 653773588), parameter_count=1048576), 'transformer_blocks.1.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(653773588, 653775636), parameter_count=512), 'transformer_blocks.1.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(653775636, 657969940), parameter_count=1048576), 'transformer_blocks.1.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(657969940, 657971988), parameter_count=512), 'transformer_blocks.1.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(657971988, 657974036), parameter_count=512), 'transformer_blocks.1.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(657974036, 657976084), parameter_count=512), 'transformer_blocks.1.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(657976084, 657978132), parameter_count=512), 'transformer_blocks.2.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(657978132, 657984276), parameter_count=1536), 'transformer_blocks.2.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(657984276, 661130004), parameter_count=786432), 'transformer_blocks.2.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(661130004, 661132052), parameter_count=512), 'transformer_blocks.2.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(661132052, 662180628), parameter_count=262144), 'transformer_blocks.2.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(662180628, 662188820), parameter_count=2048), 'transformer_blocks.2.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(662188820, 666383124), parameter_count=1048576), 'transformer_blocks.2.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(666383124, 666385172), parameter_count=512), 'transformer_blocks.2.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(666385172, 670579476), parameter_count=1048576), 'transformer_blocks.2.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(670579476, 670581524), parameter_count=512), 'transformer_blocks.2.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(670581524, 670583572), parameter_count=512), 'transformer_blocks.2.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(670583572, 670585620), parameter_count=512), 'transformer_blocks.2.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(670585620, 670587668), parameter_count=512), 'transformer_blocks.3.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(670587668, 670593812), parameter_count=1536), 'transformer_blocks.3.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(670593812, 673739540), parameter_count=786432), 'transformer_blocks.3.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(673739540, 673741588), parameter_count=512), 'transformer_blocks.3.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(673741588, 674790164), parameter_count=262144), 'transformer_blocks.3.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(674790164, 674798356), parameter_count=2048), 'transformer_blocks.3.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(674798356, 678992660), parameter_count=1048576), 'transformer_blocks.3.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(678992660, 678994708), parameter_count=512), 'transformer_blocks.3.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(678994708, 683189012), parameter_count=1048576), 'transformer_blocks.3.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(683189012, 683191060), parameter_count=512), 'transformer_blocks.3.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(683191060, 683193108), parameter_count=512), 'transformer_blocks.3.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(683193108, 683195156), parameter_count=512), 'transformer_blocks.3.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(683195156, 683197204), parameter_count=512)}, parameter_count={'F32': 170799301})}, parameter_count={'F32': 170799301})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
metrics:
|
16 |
- precision
|
17 |
---
|
18 |
+
|
19 |
+
# Model Card for Shorsey-T2000
|
20 |
+
|
21 |
+
## Model Details
|
22 |
+
|
23 |
+
### Model Description
|
24 |
+
|
25 |
+
The Shorsey-T2000 is a custom hybrid model that combines the power of transformer-based architectures with recurrent neural networks (RNNs). Specifically, it integrates the self-attention mechanisms from Transformer-XL and T5 models with an LSTM layer to enhance the model's ability to handle complex sequence learning and long-range dependencies in text data. This model is versatile, designed to perform tasks such as text generation, causal language modeling, and question answering.
|
26 |
+
|
27 |
+
- **Developed by:** Morgan Griffin, WongrifferousAI
|
28 |
+
- **Funded by [optional]:** WongrifferousAI
|
29 |
+
- **Shared by [optional]:** WongrifferousAI
|
30 |
+
- **Model type:** Hybrid Transformer-RNN (TransformerXL-T5 with LSTM)
|
31 |
+
- **Language(s) (NLP):** English (en)
|
32 |
+
- **Finetuned from model [optional]:** Custom architecture
|
33 |
+
|
34 |
+
### Direct Use
|
35 |
+
|
36 |
+
This model can be used directly for:
|
37 |
+
- **Text Generation:** Generating coherent and contextually relevant text sequences.
|
38 |
+
- **Causal Language Modeling:** Predicting the next word in a sequence, which can be applied to various NLP tasks like auto-completion or story generation.
|
39 |
+
- **Question Answering:** Providing answers to questions based on a given context.
|
40 |
+
|
41 |
+
### Downstream Use [optional]
|
42 |
+
|
43 |
+
The model can be fine-tuned for specific tasks such as:
|
44 |
+
- **Sentiment Analysis:** Fine-tuning on datasets like IMDB for classifying sentiment in text.
|
45 |
+
- **Summarization:** Adapting the model for generating concise summaries of longer text documents.
|
46 |
+
|
47 |
+
### Out-of-Scope Use
|
48 |
+
|
49 |
+
This model is not designed for:
|
50 |
+
- **Real-time Conversational AI:** Due to the hybrid architecture and the complexity of the model, it may not be optimal for real-time, low-latency applications.
|
51 |
+
- **Tasks requiring multilingual support:** The model is currently trained and optimized for English language processing only.
|
52 |
+
|
53 |
+
## Bias, Risks, and Limitations
|
54 |
+
|
55 |
+
As with any AI model, the Shorsey-T2000 may have biases present in the training data, which could manifest in its outputs. It's important to recognize:
|
56 |
+
- **Bias in Training Data:** The model may reflect biases present in the datasets it was trained on, such as stereotypes or unbalanced representations of certain groups.
|
57 |
+
- **Limited Context Understanding:** Despite the RNN integration, the model might struggle with highly nuanced context or very long-term dependencies beyond its training data.
|
58 |
+
|
59 |
+
### Recommendations
|
60 |
+
|
61 |
+
- **Human-in-the-Loop:** For applications where fairness and bias are critical, it's recommended to have a human review outputs generated by the model.
|
62 |
+
- **Bias Mitigation:** Consider using additional data preprocessing techniques or post-processing steps to mitigate biases in the model's predictions.
|
63 |
+
|
64 |
+
## How to Get Started with the Model
|
65 |
+
|
66 |
+
You can start using the Shorsey-T2000 model with the following code snippet:
|
67 |
+
|
68 |
+
```python
|
69 |
+
from transformers import BertTokenizerFast, AutoModel
|
70 |
+
|
71 |
+
tokenizer = BertTokenizerFast.from_pretrained("Wonder-Griffin/Shorsey-T2000")
|
72 |
+
model = AutoModel.from_pretrained("Wonder-Griffin/Shorsey-T2000")
|
73 |
+
|
74 |
+
input_text = "Once upon a time"
|
75 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
|
76 |
+
|
77 |
+
# Generate text
|
78 |
+
output = model.generate(input_ids, max_length=100)
|
79 |
+
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
|
80 |
+
print(generated_text)
|
81 |
+
|
82 |
+
##Training Data
|
83 |
+
|
84 |
+
The model was trained on the stanfordnlp/imdb dataset, which contains movie reviews labeled with sentiment. Additional datasets may have been used for other tasks like question answering and language modeling.
|
85 |
+
|
86 |
+
## Preprocessing [optional]
|
87 |
+
|
88 |
+
Text data was tokenized using the standard transformer tokenizer, with additional preprocessing steps to ensure consistent input formatting across different tasks.
|
89 |
+
## Training Hyperparameters
|
90 |
+
|
91 |
+
Training regime: fp32 precision, AdamW optimizer, learning rate of 3e-5, batch size of 8.
|
92 |
+
Max epochs: 10 epochs
|
93 |
+
Learning Rate Schedule: Linear decay with warmup steps.
|
94 |
+
|
95 |
+
## Speeds, Sizes, Times [optional]
|
96 |
+
|
97 |
+
Training Time: Approximately 36 hours on a single NVIDIA V100 GPU.
|
98 |
+
Model Size: ~500M parameters
|
99 |
+
Checkpoint Size: ~2GB
|
100 |
+
|
101 |
+
|
102 |
+
## Testing Data
|
103 |
+
|
104 |
+
The model was tested on a held-out portion of the stanfordnlp/imdb dataset to evaluate its performance on sentiment classification and text generation tasks.
|
105 |
+
Factors
|
106 |
+
|
107 |
+
Domain: Movie reviews, general text generation.
|
108 |
+
Subpopulations: Different sentiment categories (positive, negative).
|
109 |
+
|
110 |
+
## Metrics
|
111 |
+
|
112 |
+
Precision: Used to evaluate the model's accuracy in generating correct text and answering questions.
|
113 |
+
|
114 |
+
## Results
|
115 |
+
|
116 |
+
The model demonstrated strong performance on text generation tasks, particularly in generating coherent and contextually appropriate responses. However, it shows a slight tendency towards generating overly positive or negative responses based on the context provided.
|
117 |
+
Summary
|
118 |
+
|
119 |
+
The Shorsey-T2000 is a versatile and powerful model for various NLP tasks, especially in text generation and language modeling. Its hybrid architecture makes it particularly effective in capturing both short-term and long-term dependencies in text.
|
120 |
+
Technical Specifications [optional]
|
121 |
+
Model Architecture and Objective
|
122 |
+
|
123 |
+
The Shorsey-T2000 is a hybrid model combining Transformer-XL and T5 architectures with an LSTM layer to enhance sequence learning capabilities. It uses multi-head self-attention mechanisms, positional encodings, and RNN layers to process and generate text.
|
124 |
+
|
125 |
+
## Model Card Authors [optional]
|
126 |
+
|
127 |
+
Morgan Griffin, WongrifferousAI
|
128 |
+
|
129 |
+
## Model Card Contact
|
130 |
+
|
131 |
+
Contact: Morgan Griffin, WongrifferousAI
|
132 |
+
|
133 |
+
|
134 |
+
### Summary of Key Information:
|
135 |
+
- **Model Name:** Shorsey-T2000
|
136 |
+
- **Model Type:** Hybrid Transformer-RNN (TransformerXL-T5 with LSTM)
|
137 |
+
- **Developed by:** Morgan Griffin, WongrifferousAI
|
138 |
+
- **Primary Tasks:** Text generation, causal language modeling, question answering
|
139 |
+
- **Language:** English
|
140 |
+
- **Key Metrics:** Precision, among others
|