Wonder-Griffin commited on
Commit
2c72ec3
·
verified ·
1 Parent(s): d717ff6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -1
README.md CHANGED
@@ -15,4 +15,126 @@ pipeline_tag: text-generation
15
  metrics:
16
  - precision
17
  ---
18
- SafetensorsRepoMetadata(metadata=None, sharded=False, weight_map={'casual_lm_head.bias': 'model.safetensors', 'casual_lm_head.weight': 'model.safetensors', 'embedding.weight': 'model.safetensors', 'general_head.bias': 'model.safetensors', 'general_head.weight': 'model.safetensors', 'pos_encoding.pe': 'model.safetensors', 'qa_head.bias': 'model.safetensors', 'qa_head.weight': 'model.safetensors', 'rnn.bias_hh_l0': 'model.safetensors', 'rnn.bias_hh_l0_reverse': 'model.safetensors', 'rnn.bias_ih_l0': 'model.safetensors', 'rnn.bias_ih_l0_reverse': 'model.safetensors', 'rnn.weight_hh_l0': 'model.safetensors', 'rnn.weight_hh_l0_reverse': 'model.safetensors', 'rnn.weight_ih_l0': 'model.safetensors', 'rnn.weight_ih_l0_reverse': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.0.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.0.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.0.layernorm1.bias': 'model.safetensors', 'transformer_blocks.0.layernorm1.weight': 'model.safetensors', 'transformer_blocks.0.layernorm2.bias': 'model.safetensors', 'transformer_blocks.0.layernorm2.weight': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.1.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.1.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.1.layernorm1.bias': 'model.safetensors', 'transformer_blocks.1.layernorm1.weight': 'model.safetensors', 'transformer_blocks.1.layernorm2.bias': 'model.safetensors', 'transformer_blocks.1.layernorm2.weight': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.2.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.2.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.2.layernorm1.bias': 'model.safetensors', 'transformer_blocks.2.layernorm1.weight': 'model.safetensors', 'transformer_blocks.2.layernorm2.bias': 'model.safetensors', 'transformer_blocks.2.layernorm2.weight': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.in_proj_bias': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.in_proj_weight': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.out_proj.bias': 'model.safetensors', 'transformer_blocks.3.attention.multihead_attn.out_proj.weight': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc1.bias': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc1.weight': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc2.bias': 'model.safetensors', 'transformer_blocks.3.feed_forward.fc2.weight': 'model.safetensors', 'transformer_blocks.3.layernorm1.bias': 'model.safetensors', 'transformer_blocks.3.layernorm1.weight': 'model.safetensors', 'transformer_blocks.3.layernorm2.bias': 'model.safetensors', 'transformer_blocks.3.layernorm2.weight': 'model.safetensors'}, files_metadata={'model.safetensors': SafetensorsFileMetadata(metadata={'format': 'pt'}, tensors={'casual_lm_head.bias': TensorInfo(dtype='F32', shape=[60000], data_offsets=(0, 240000), parameter_count=60000), 'casual_lm_head.weight': TensorInfo(dtype='F32', shape=[60000, 1024], data_offsets=(240000, 246000000), parameter_count=61440000), 'embedding.weight': TensorInfo(dtype='F32', shape=[60000, 512], data_offsets=(246000000, 368880000), parameter_count=30720000), 'general_head.bias': TensorInfo(dtype='F32', shape=[60000], data_offsets=(368880000, 369120000), parameter_count=60000), 'general_head.weight': TensorInfo(dtype='F32', shape=[60000, 1024], data_offsets=(369120000, 614880000), parameter_count=61440000), 'pos_encoding.pe': TensorInfo(dtype='F32', shape=[1, 512, 512], data_offsets=(614880000, 615928576), parameter_count=262144), 'qa_head.bias': TensorInfo(dtype='F32', shape=[5], data_offsets=(615928576, 615928596), parameter_count=5), 'qa_head.weight': TensorInfo(dtype='F32', shape=[5, 1024], data_offsets=(615928596, 615949076), parameter_count=5120), 'rnn.bias_hh_l0': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615949076, 615957268), parameter_count=2048), 'rnn.bias_hh_l0_reverse': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615957268, 615965460), parameter_count=2048), 'rnn.bias_ih_l0': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615965460, 615973652), parameter_count=2048), 'rnn.bias_ih_l0_reverse': TensorInfo(dtype='F32', shape=[2048], data_offsets=(615973652, 615981844), parameter_count=2048), 'rnn.weight_hh_l0': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(615981844, 620176148), parameter_count=1048576), 'rnn.weight_hh_l0_reverse': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(620176148, 624370452), parameter_count=1048576), 'rnn.weight_ih_l0': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(624370452, 628564756), parameter_count=1048576), 'rnn.weight_ih_l0_reverse': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(628564756, 632759060), parameter_count=1048576), 'transformer_blocks.0.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(632759060, 632765204), parameter_count=1536), 'transformer_blocks.0.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(632765204, 635910932), parameter_count=786432), 'transformer_blocks.0.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(635910932, 635912980), parameter_count=512), 'transformer_blocks.0.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(635912980, 636961556), parameter_count=262144), 'transformer_blocks.0.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(636961556, 636969748), parameter_count=2048), 'transformer_blocks.0.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(636969748, 641164052), parameter_count=1048576), 'transformer_blocks.0.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(641164052, 641166100), parameter_count=512), 'transformer_blocks.0.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(641166100, 645360404), parameter_count=1048576), 'transformer_blocks.0.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(645360404, 645362452), parameter_count=512), 'transformer_blocks.0.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(645362452, 645364500), parameter_count=512), 'transformer_blocks.0.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(645364500, 645366548), parameter_count=512), 'transformer_blocks.0.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(645366548, 645368596), parameter_count=512), 'transformer_blocks.1.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(645368596, 645374740), parameter_count=1536), 'transformer_blocks.1.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(645374740, 648520468), parameter_count=786432), 'transformer_blocks.1.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(648520468, 648522516), parameter_count=512), 'transformer_blocks.1.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(648522516, 649571092), parameter_count=262144), 'transformer_blocks.1.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(649571092, 649579284), parameter_count=2048), 'transformer_blocks.1.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(649579284, 653773588), parameter_count=1048576), 'transformer_blocks.1.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(653773588, 653775636), parameter_count=512), 'transformer_blocks.1.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(653775636, 657969940), parameter_count=1048576), 'transformer_blocks.1.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(657969940, 657971988), parameter_count=512), 'transformer_blocks.1.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(657971988, 657974036), parameter_count=512), 'transformer_blocks.1.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(657974036, 657976084), parameter_count=512), 'transformer_blocks.1.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(657976084, 657978132), parameter_count=512), 'transformer_blocks.2.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(657978132, 657984276), parameter_count=1536), 'transformer_blocks.2.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(657984276, 661130004), parameter_count=786432), 'transformer_blocks.2.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(661130004, 661132052), parameter_count=512), 'transformer_blocks.2.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(661132052, 662180628), parameter_count=262144), 'transformer_blocks.2.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(662180628, 662188820), parameter_count=2048), 'transformer_blocks.2.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(662188820, 666383124), parameter_count=1048576), 'transformer_blocks.2.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(666383124, 666385172), parameter_count=512), 'transformer_blocks.2.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(666385172, 670579476), parameter_count=1048576), 'transformer_blocks.2.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(670579476, 670581524), parameter_count=512), 'transformer_blocks.2.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(670581524, 670583572), parameter_count=512), 'transformer_blocks.2.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(670583572, 670585620), parameter_count=512), 'transformer_blocks.2.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(670585620, 670587668), parameter_count=512), 'transformer_blocks.3.attention.multihead_attn.in_proj_bias': TensorInfo(dtype='F32', shape=[1536], data_offsets=(670587668, 670593812), parameter_count=1536), 'transformer_blocks.3.attention.multihead_attn.in_proj_weight': TensorInfo(dtype='F32', shape=[1536, 512], data_offsets=(670593812, 673739540), parameter_count=786432), 'transformer_blocks.3.attention.multihead_attn.out_proj.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(673739540, 673741588), parameter_count=512), 'transformer_blocks.3.attention.multihead_attn.out_proj.weight': TensorInfo(dtype='F32', shape=[512, 512], data_offsets=(673741588, 674790164), parameter_count=262144), 'transformer_blocks.3.feed_forward.fc1.bias': TensorInfo(dtype='F32', shape=[2048], data_offsets=(674790164, 674798356), parameter_count=2048), 'transformer_blocks.3.feed_forward.fc1.weight': TensorInfo(dtype='F32', shape=[2048, 512], data_offsets=(674798356, 678992660), parameter_count=1048576), 'transformer_blocks.3.feed_forward.fc2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(678992660, 678994708), parameter_count=512), 'transformer_blocks.3.feed_forward.fc2.weight': TensorInfo(dtype='F32', shape=[512, 2048], data_offsets=(678994708, 683189012), parameter_count=1048576), 'transformer_blocks.3.layernorm1.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(683189012, 683191060), parameter_count=512), 'transformer_blocks.3.layernorm1.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(683191060, 683193108), parameter_count=512), 'transformer_blocks.3.layernorm2.bias': TensorInfo(dtype='F32', shape=[512], data_offsets=(683193108, 683195156), parameter_count=512), 'transformer_blocks.3.layernorm2.weight': TensorInfo(dtype='F32', shape=[512], data_offsets=(683195156, 683197204), parameter_count=512)}, parameter_count={'F32': 170799301})}, parameter_count={'F32': 170799301})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  metrics:
16
  - precision
17
  ---
18
+
19
+ # Model Card for Shorsey-T2000
20
+
21
+ ## Model Details
22
+
23
+ ### Model Description
24
+
25
+ The Shorsey-T2000 is a custom hybrid model that combines the power of transformer-based architectures with recurrent neural networks (RNNs). Specifically, it integrates the self-attention mechanisms from Transformer-XL and T5 models with an LSTM layer to enhance the model's ability to handle complex sequence learning and long-range dependencies in text data. This model is versatile, designed to perform tasks such as text generation, causal language modeling, and question answering.
26
+
27
+ - **Developed by:** Morgan Griffin, WongrifferousAI
28
+ - **Funded by [optional]:** WongrifferousAI
29
+ - **Shared by [optional]:** WongrifferousAI
30
+ - **Model type:** Hybrid Transformer-RNN (TransformerXL-T5 with LSTM)
31
+ - **Language(s) (NLP):** English (en)
32
+ - **Finetuned from model [optional]:** Custom architecture
33
+
34
+ ### Direct Use
35
+
36
+ This model can be used directly for:
37
+ - **Text Generation:** Generating coherent and contextually relevant text sequences.
38
+ - **Causal Language Modeling:** Predicting the next word in a sequence, which can be applied to various NLP tasks like auto-completion or story generation.
39
+ - **Question Answering:** Providing answers to questions based on a given context.
40
+
41
+ ### Downstream Use [optional]
42
+
43
+ The model can be fine-tuned for specific tasks such as:
44
+ - **Sentiment Analysis:** Fine-tuning on datasets like IMDB for classifying sentiment in text.
45
+ - **Summarization:** Adapting the model for generating concise summaries of longer text documents.
46
+
47
+ ### Out-of-Scope Use
48
+
49
+ This model is not designed for:
50
+ - **Real-time Conversational AI:** Due to the hybrid architecture and the complexity of the model, it may not be optimal for real-time, low-latency applications.
51
+ - **Tasks requiring multilingual support:** The model is currently trained and optimized for English language processing only.
52
+
53
+ ## Bias, Risks, and Limitations
54
+
55
+ As with any AI model, the Shorsey-T2000 may have biases present in the training data, which could manifest in its outputs. It's important to recognize:
56
+ - **Bias in Training Data:** The model may reflect biases present in the datasets it was trained on, such as stereotypes or unbalanced representations of certain groups.
57
+ - **Limited Context Understanding:** Despite the RNN integration, the model might struggle with highly nuanced context or very long-term dependencies beyond its training data.
58
+
59
+ ### Recommendations
60
+
61
+ - **Human-in-the-Loop:** For applications where fairness and bias are critical, it's recommended to have a human review outputs generated by the model.
62
+ - **Bias Mitigation:** Consider using additional data preprocessing techniques or post-processing steps to mitigate biases in the model's predictions.
63
+
64
+ ## How to Get Started with the Model
65
+
66
+ You can start using the Shorsey-T2000 model with the following code snippet:
67
+
68
+ ```python
69
+ from transformers import BertTokenizerFast, AutoModel
70
+
71
+ tokenizer = BertTokenizerFast.from_pretrained("Wonder-Griffin/Shorsey-T2000")
72
+ model = AutoModel.from_pretrained("Wonder-Griffin/Shorsey-T2000")
73
+
74
+ input_text = "Once upon a time"
75
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
76
+
77
+ # Generate text
78
+ output = model.generate(input_ids, max_length=100)
79
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
80
+ print(generated_text)
81
+
82
+ ##Training Data
83
+
84
+ The model was trained on the stanfordnlp/imdb dataset, which contains movie reviews labeled with sentiment. Additional datasets may have been used for other tasks like question answering and language modeling.
85
+
86
+ ## Preprocessing [optional]
87
+
88
+ Text data was tokenized using the standard transformer tokenizer, with additional preprocessing steps to ensure consistent input formatting across different tasks.
89
+ ## Training Hyperparameters
90
+
91
+ Training regime: fp32 precision, AdamW optimizer, learning rate of 3e-5, batch size of 8.
92
+ Max epochs: 10 epochs
93
+ Learning Rate Schedule: Linear decay with warmup steps.
94
+
95
+ ## Speeds, Sizes, Times [optional]
96
+
97
+ Training Time: Approximately 36 hours on a single NVIDIA V100 GPU.
98
+ Model Size: ~500M parameters
99
+ Checkpoint Size: ~2GB
100
+
101
+
102
+ ## Testing Data
103
+
104
+ The model was tested on a held-out portion of the stanfordnlp/imdb dataset to evaluate its performance on sentiment classification and text generation tasks.
105
+ Factors
106
+
107
+ Domain: Movie reviews, general text generation.
108
+ Subpopulations: Different sentiment categories (positive, negative).
109
+
110
+ ## Metrics
111
+
112
+ Precision: Used to evaluate the model's accuracy in generating correct text and answering questions.
113
+
114
+ ## Results
115
+
116
+ The model demonstrated strong performance on text generation tasks, particularly in generating coherent and contextually appropriate responses. However, it shows a slight tendency towards generating overly positive or negative responses based on the context provided.
117
+ Summary
118
+
119
+ The Shorsey-T2000 is a versatile and powerful model for various NLP tasks, especially in text generation and language modeling. Its hybrid architecture makes it particularly effective in capturing both short-term and long-term dependencies in text.
120
+ Technical Specifications [optional]
121
+ Model Architecture and Objective
122
+
123
+ The Shorsey-T2000 is a hybrid model combining Transformer-XL and T5 architectures with an LSTM layer to enhance sequence learning capabilities. It uses multi-head self-attention mechanisms, positional encodings, and RNN layers to process and generate text.
124
+
125
+ ## Model Card Authors [optional]
126
+
127
+ Morgan Griffin, WongrifferousAI
128
+
129
+ ## Model Card Contact
130
+
131
+ Contact: Morgan Griffin, WongrifferousAI
132
+
133
+
134
+ ### Summary of Key Information:
135
+ - **Model Name:** Shorsey-T2000
136
+ - **Model Type:** Hybrid Transformer-RNN (TransformerXL-T5 with LSTM)
137
+ - **Developed by:** Morgan Griffin, WongrifferousAI
138
+ - **Primary Tasks:** Text generation, causal language modeling, question answering
139
+ - **Language:** English
140
+ - **Key Metrics:** Precision, among others