AfterRain007
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -44,14 +44,15 @@ Output:
|
|
44 |
Total of 3.803 text have been labelled manually to fine tune the model, with consideration of non-duplicate and a minimum of 4 words after cleaning. The following website were used for our training dataset:
|
45 |
1. Bitcoin tweet dataset from [Kaggle Datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets) (Randomly picked).
|
46 |
2. Labelled crypto sentiment dataset from [SurgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset).
|
47 |
-
3. Reddit thread r/Bitcoin with the topic "Daily Discussion" (Randomly picked)
|
48 |
-
|
|
|
49 |
|
50 |
# Source Code
|
51 |
See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
|
52 |
|
53 |
# Credit
|
54 |
-
Credit where credit
|
55 |
|
56 |
1. Muhaza Liebenlito, M.Si and Prof. Dr. Nur Inayah, M.Si. as my academic advisor.
|
57 |
2. Risky Amalia Marhariyadi for helping labelling the dataset.
|
|
|
44 |
Total of 3.803 text have been labelled manually to fine tune the model, with consideration of non-duplicate and a minimum of 4 words after cleaning. The following website were used for our training dataset:
|
45 |
1. Bitcoin tweet dataset from [Kaggle Datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets) (Randomly picked).
|
46 |
2. Labelled crypto sentiment dataset from [SurgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset).
|
47 |
+
3. Reddit thread r/Bitcoin with the topic "Daily Discussion" (Randomly picked)
|
48 |
+
|
49 |
+
Data augmentation was also performed to enrich the dataset, Back-Translation was used with Google Translate API on 10 language ('it', 'fr', "sv", "da", 'pt', 'id', 'pl', 'hr', "bg", "fi").
|
50 |
|
51 |
# Source Code
|
52 |
See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
|
53 |
|
54 |
# Credit
|
55 |
+
Credit where credit is due, thank you for all!
|
56 |
|
57 |
1. Muhaza Liebenlito, M.Si and Prof. Dr. Nur Inayah, M.Si. as my academic advisor.
|
58 |
2. Risky Amalia Marhariyadi for helping labelling the dataset.
|