AfterRain007
/

cryptobertRefined

Text Classification

Sentiment Analysis

Model card Files Files and versions Community

AfterRain007 commited on Feb 23, 2024

Commit

1fd144d

·

verified ·

1 Parent(s): 0e663aa

Update README.md

Files changed (1) hide show

README.md +20 -6

README.md CHANGED Viewed

@@ -15,14 +15,28 @@ tags:
 ---
 # CryptoBERTRefined
-CryptoBERTRefined is a fine tuned model from [CryptoBERT by Elkulako](https://huggingface.co/ElKulako/cryptobert) model (See the base model to see it's training corpus).
-# Training Process
-Total of 3.803 text have been labelled manually to fine tune the model, and data augmentation is done with Back-Translation using Google Translate API with 10 language ('it', 'fr', "sv", "da", 'pt', 'id', 'pl', 'hr', "bg", "fi").
 # Training Corpus
-Randomly picked text from [kaggle datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets)
-Labelled sentiment text from [surgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset)
 # Source Code
-See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.

 ---
 # CryptoBERTRefined
+CryptoBERTRefined is a fine tuned model from [CryptoBERT by Elkulako](https://huggingface.co/ElKulako/cryptobert) model (See the base model to see it's training description).
+# Classification Example
+```
+Import your code here!
+```
 # Training Corpus
+Total of 3.803 text have been labelled manually to fine tune the model, with consideration of non-duplicate and a minimum of 4 words after cleaning. The following website were used for our training dataset:
+1. Bitcoin tweet dataset from [kaggle datasets](https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets) (Randomly picked).
+2. Labelled crypto sentiment dataset from [surgeAI](https://www.surgehq.ai/datasets/crypto-sentiment-dataset).
+3. Reddit thread r/Bitcoin with the topic "Daily Discussion" (Randomly picked).
+Data augmentation is done to enrich the dataset, Back-Translation were used with Google Translate API on 10 language ('it', 'fr', "sv", "da", 'pt', 'id', 'pl', 'hr', "bg", "fi").
 # Source Code
+See [Github](https://github.com/AfterRain007/cryptobertRefined) for the source code to finetune cryptoBERT model into cryptoBERTRefined.
+# Credit
+Credit where credit's due, thank you for all!
+1. Muhaza Liebenlito, M.Si and Prof. Dr. Nur Inayah, M.Si. as my academic advisor.
+2. Risky Amalia Marhariyadi for helping labelling the dataset.
+3. SurgeAI for the dataset.
+4. Mikolaj Kulakowski and Flavius Frasincar for the original CryptoBERT model.
+5. Kaushik Suresh for the bitcoin tweets.