--- tags: - indonesian - sentiment-analysis - finance - financial-sentiment - indo-roberta - transformers - fine-tuned - huggingface - ihsg - stock-market - nlp - indonesian-roberta-base-financial-sentiment-classifier language: - id datasets: - intanm/indonesian-financial-sentiment-analysis metrics: - accuracy - f1 - precision - recall model-index: - name: indo-roBERTa-financial-sentiment results: - task: type: text-classification name: Text Classification (Sentiment Analysis) dataset: name: indonesian-financial-sentiment type: intanm/indonesian-financial-sentiment-analysis metrics: - name: Accuracy type: accuracy value: 0.9749 - name: F1 type: f1 value: 0.9749 - name: Precision type: precision value: 0.9749 - name: Recall type: recall value: 0.9749 license: mit library_name: transformers --- # ๐Ÿ‡ฎ๐Ÿ‡ฉ IndoRoBERTa for Indonesian Financial Sentiment Classification This is a fine-tuned version of [`w11wo/indonesian-roberta-base-sentiment-classifier`](https://huggingface.co/w11wo/indonesian-roberta-base-sentiment-classifier), specialized for **Indonesian financial news sentiment classification** since i cant find any financial sentiment models for indonesian market, i decided to make my self. ### ๐Ÿง  Model Summary | Field | Value | |------------------|-----------------------------------------------------------------------| | **Model Name** | `ihsan31415/indo-roBERTa-financial-sentiment` | | **Base Model** | [`w11wo/indonesian-roberta-base-sentiment-classifier`](https://huggingface.co/w11wo/indonesian-roberta-base-sentiment-classifier) | | **Language** | Indonesian (`id`) | | **Task** | Sentiment Analysis (Financial) | | **Labels** | `0`: Positive, `1`: Neutral, `2`: Negative *(โš ๏ธ flipped label order)* | | **Dataset** | [`intanm/indonesian-financial-sentiment-analysis`](https://huggingface.co/datasets/intanm/indonesian-financial-sentiment-analysis) + synthetic and augmented samples | | **Fine-tuned by** | [`ihsan31415`](https://huggingface.co/ihsan31415) | | **Training Epochs** | 5 (Early stopping at epoch 5, best at epoch 3) | | **Eval Accuracy** | `97.49%` | --- ## ๐Ÿง  Model Objective This model classifies Indonesian financial news articles into: * `0` โ†’ **Positive** * `1` โ†’ **Neutral** * `2` โ†’ **Negative** โš ๏ธ **Important: Label Mapping is Flipped** This label order follows the base model's unexpected configuration. During training and evaluation, the dataset was relabeled accordingly. > โš ๏ธ Always interpret model output using this mapping: > > * `0`: Positive > * `1`: Neutral > * `2`: Negative --- ## ๐Ÿ“Š Dataset & Preprocessing Pipeline ### ๐Ÿ”น Source Dataset * [`intanm/indonesian-financial-sentiment-analysis`](https://huggingface.co/datasets/intanm/indonesian-financial-sentiment-analysis) * Labeled financial news (imbalanced and limited) ### ๐Ÿ“ˆ Data Augmentation & Balancing #### 1. ๐Ÿงช Gemini Synthetic Generation * Generated structured financial news samples using `gemini-2.0-flash-lite` * Targeted generation for underrepresented classes #### 2. โœ๏ธ GPT-2 Prompt Completion * Used [`indonesian-nlp/gpt2-medium-indonesian`](https://huggingface.co/indonesian-nlp/gpt2-medium-indonesian) * Prompt templates varied and strictly separated between train/test sets #### 3. ๐Ÿงฉ Roberta-Based Masked Augmentation * Strategic masking/filling while protecting key financial terms * Iterative masking to increase diversity and context coverage #### ๐Ÿ“Š Final Label Distribution **Train Set**: ``` 2 (Negative): 22906 1 (Neutral): 23374 0 (Positive): 23423 ``` **Test Set**: ``` 2 (Negative): 9817 1 (Neutral): 10018 0 (Positive): 10039 ``` --- ## ๐Ÿ‹๏ธ Training Details ### ๐Ÿ” Label Flipping > The base model uses **non-standard labels**: > > * `0`: Positive > * `1`: Neutral > * `2`: Negative > > Training data was relabeled accordingly. ### ๐Ÿ”ง TrainingArguments ```python TrainingArguments( output_dir="./results-roberta", eval_strategy="epoch", save_strategy="epoch", logging_strategy="epoch", per_device_train_batch_size=256, per_device_eval_batch_size=256, num_train_epochs=15, learning_rate=2e-5, weight_decay=0.01, load_best_model_at_end=True, metric_for_best_model="accuracy", save_total_limit=4, ) ``` * Early stopping (`patience=2`) * Training completed at **epoch 5**, best model from **epoch 3** ### ๐Ÿ“Š Training Progress | Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 Score | |-------|----------------|------------------|------------|------------|------------|------------| | 1 | 0.104500 | 0.085562 | 0.969402 | 0.969715 | 0.969402 | 0.969356 | | 2 | 0.029100 | 0.088392 | 0.974859 | 0.974914 | 0.974859 | 0.974860 | | 3 | 0.012700 | 0.102305 | 0.974926 | 0.974949 | 0.974926 | 0.974933 | | 4 | 0.008900 | 0.125707 | 0.972816 | 0.972959 | 0.972816 | 0.972846 | | 5 | 0.004400 | 0.157659 | 0.966690 | 0.966902 | 0.966690 | 0.966676 | ### โœ… Evaluation Results ```bash eval_loss = 0.10230540484189987 eval_accuracy = 0.9749255130394028 eval_precision = 0.9749490510899772 eval_recall = 0.9749255130394028 eval_f1 = 0.9749326327197978 eval_runtime = 71.9098 eval_samples_per_second = 415.395 eval_steps_per_second = 1.627 epoch = 5.0 ``` --- ## ๐Ÿ”Ž Usage #### Using Pipeline ```python from transformers import pipeline pretrained_name = "ihsan31415/indo-roBERTa-financial-sentiment" nlp = pipeline( "sentiment-analysis", model=pretrained_name, tokenizer=pretrained_name ) nlp("IHSG diprediksi melemah karena sentimen global negatif") ``` #### RAW ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("ihsan31415/indo-roBERTa-financial-sentiment") tokenizer = AutoTokenizer.from_pretrained("ihsan31415/indo-roBERTa-financial-sentiment") # Example input text = "IHSG diprediksi melemah karena sentimen global negatif" inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) # Get predicted class predicted_label = torch.argmax(outputs.logits, dim=1).item() # Interpret using flipped label mapping label_map = { 0: "Positive", 1: "Neutral", 2: "Negative" } print(f"Predicted sentiment: {label_map[predicted_label]}") ``` ## Author this indonesian RoBERTa base financial CLassifier was trained and evaluated by Khoirul Ihsan using Google colab GPU T4. --- ## ๐Ÿ“Œ Citation ```bibtex @misc{khoirul_ihsan_2025, title = {IndoRoBERTa for Indonesian Financial Sentiment Classification}, author = {Khoirul Ihsan}, howpublished = {\url{https://huggingface.co/ihsan31415/indo-roBERTa-financial-sentiment}}, year = {2025}, note = {Fine-tuned from w11wo/indonesian-roberta-base-sentiment-classifier using augmented financial news data from intanm/indonesian-financial-sentiment-analysis and various synthetic generation methods (Gemini, GPT-2, Roberta masking).}, publisher = {Hugging Face} } ``` --- ## ๐Ÿ“ฌ Contact Created with love and tears by ihsan:\ [![HuggingFace](https://img.shields.io/badge/HuggingFace-orange?style=flat&logo=huggingface&logoColor=white)](https://huggingface.co/ihsan31415) [![GitHub](https://img.shields.io/badge/GitHub-black?style=flat&logo=github&logoColor=white)](https://github.com/ihsan31415) [![LinkedIn](https://img.shields.io/badge/LinkedIn-blue?style=flat&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/khoirul-ihsan-387115288/)\ For collaborations or questions, feel free to reach out via Hugging Face or GitHub.