AfterRain007
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -15,14 +15,28 @@ tags:
|
|
15 |
---
|
16 |
|
17 |
# CryptoBERTRefined
|
18 |
-
CryptoBERTRefined is a fine tuned model from [CryptoBERT by Elkulako](https://huggingface.co/ElKulako/cryptobert) model (See the base model to see it's training
|
19 |
|
20 |
-
#
|
21 |
-
|
|
|
|
|
22 |
|
23 |
# Training Corpus
|
24 |
-
|
25 |
-
|
|
|
|
|
|
|
26 |
|
27 |
# Source Code
|
28 |
-
See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
|
17 |
# CryptoBERTRefined
|
18 |
+
CryptoBERTRefined is a fine tuned model from [CryptoBERT by Elkulako](https://huggingface.co/ElKulako/cryptobert) model (See the base model to see it's training description).
|
19 |
|
20 |
+
# Classification Example
|
21 |
+
```
|
22 |
+
Import your code here!
|
23 |
+
```
|
24 |
|
25 |
# Training Corpus
|
26 |
+
Total of 3.803 text have been labelled manually to fine tune the model, with consideration of non-duplicate and a minimum of 4 words after cleaning. The following website were used for our training dataset:
|
27 |
+
1. Bitcoin tweet dataset from [kaggle datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets) (Randomly picked).
|
28 |
+
2. Labelled crypto sentiment dataset from [surgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset).
|
29 |
+
3. Reddit thread r/Bitcoin with the topic "Daily Discussion" (Randomly picked).
|
30 |
+
Data augmentation is done to enrich the dataset, Back-Translation were used with Google Translate API on 10 language ('it', 'fr', "sv", "da", 'pt', 'id', 'pl', 'hr', "bg", "fi").
|
31 |
|
32 |
# Source Code
|
33 |
+
See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
|
34 |
+
|
35 |
+
# Credit
|
36 |
+
Credit where credit's due, thank you for all!
|
37 |
+
|
38 |
+
1. Muhaza Liebenlito, M.Si and Prof. Dr. Nur Inayah, M.Si. as my academic advisor.
|
39 |
+
2. Risky Amalia Marhariyadi for helping labelling the dataset.
|
40 |
+
3. SurgeAI for the dataset.
|
41 |
+
4. Mikolaj Kulakowski and Flavius Frasincar for the original CryptoBERT model.
|
42 |
+
5. Kaushik Suresh for the bitcoin tweets.
|