Update README.md
Browse files
README.md
CHANGED
|
@@ -2,8 +2,11 @@
|
|
| 2 |
CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It is built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
|
| 3 |
|
| 4 |
|
|
|
|
|
|
|
|
|
|
| 5 |
## Training Corpus
|
| 6 |
-
CryptoBERT was trained on 3.2M social media posts
|
| 7 |
|
| 8 |
|
| 9 |
(1) StockTwits - 1.875M posts about the top 100 cryptos by trading volume. Posts were collected from the 1st of November 2021 to the 16th of June 2022.
|
|
|
|
| 2 |
CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It is built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
|
| 3 |
|
| 4 |
|
| 5 |
+
## Classification Training
|
| 6 |
+
CryptoBERT's sentiment classification head was fine-tuned on
|
| 7 |
+
|
| 8 |
## Training Corpus
|
| 9 |
+
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
| 10 |
|
| 11 |
|
| 12 |
(1) StockTwits - 1.875M posts about the top 100 cryptos by trading volume. Posts were collected from the 1st of November 2021 to the 16th of June 2022.
|