AfterRain007 commited on
Commit
1fd144d
·
verified ·
1 Parent(s): 0e663aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -6
README.md CHANGED
@@ -15,14 +15,28 @@ tags:
15
  ---
16
 
17
  # CryptoBERTRefined
18
- CryptoBERTRefined is a fine tuned model from [CryptoBERT by Elkulako](https://huggingface.co/ElKulako/cryptobert) model (See the base model to see it's training corpus).
19
 
20
- # Training Process
21
- Total of 3.803 text have been labelled manually to fine tune the model, and data augmentation is done with Back-Translation using Google Translate API with 10 language ('it', 'fr', "sv", "da", 'pt', 'id', 'pl', 'hr', "bg", "fi").
 
 
22
 
23
  # Training Corpus
24
- Randomly picked text from [kaggle datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets)
25
- Labelled sentiment text from [surgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset)
 
 
 
26
 
27
  # Source Code
28
- See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  # CryptoBERTRefined
18
+ CryptoBERTRefined is a fine tuned model from [CryptoBERT by Elkulako](https://huggingface.co/ElKulako/cryptobert) model (See the base model to see it's training description).
19
 
20
+ # Classification Example
21
+ ```
22
+ Import your code here!
23
+ ```
24
 
25
  # Training Corpus
26
+ Total of 3.803 text have been labelled manually to fine tune the model, with consideration of non-duplicate and a minimum of 4 words after cleaning. The following website were used for our training dataset:
27
+ 1. Bitcoin tweet dataset from [kaggle datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets) (Randomly picked).
28
+ 2. Labelled crypto sentiment dataset from [surgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset).
29
+ 3. Reddit thread r/Bitcoin with the topic "Daily Discussion" (Randomly picked).
30
+ Data augmentation is done to enrich the dataset, Back-Translation were used with Google Translate API on 10 language ('it', 'fr', "sv", "da", 'pt', 'id', 'pl', 'hr', "bg", "fi").
31
 
32
  # Source Code
33
+ See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
34
+
35
+ # Credit
36
+ Credit where credit's due, thank you for all!
37
+
38
+ 1. Muhaza Liebenlito, M.Si and Prof. Dr. Nur Inayah, M.Si. as my academic advisor.
39
+ 2. Risky Amalia Marhariyadi for helping labelling the dataset.
40
+ 3. SurgeAI for the dataset.
41
+ 4. Mikolaj Kulakowski and Flavius Frasincar for the original CryptoBERT model.
42
+ 5. Kaushik Suresh for the bitcoin tweets.