polarity-3c / README.md
pkedzia's picture
Update README.md
91d3d1a verified
metadata
license: apache-2.0
language:
  - pl
base_model:
  - sdadas/polish-roberta-large-v2
pipeline_tag: text-classification
library_name: transformers
tags:
  - news

Description

polarity3c is a classification model that is specialized for determining the polarity of texts from news portals. It was learned mostly on texts in Polish.

Annotations from the plWordnet were used as the basis for the data. A pre-learned model on these annotations, served as the model in Human in the loop, to support the annotations for teaching the final model. The final model was learned on web content; data was manually collected and annotated.

As a model base, the sdadas/polish-roberta-large-v2 model was used with a classification head. More about model construction is on our blog.

Architecture

RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(128001, 1024, padding_idx=1)
      (position_embeddings): Embedding(514, 1024, padding_idx=1)
      (token_type_embeddings): Embedding(1, 1024)
      (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-23): 24 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSdpaSelfAttention(
              (query): Linear(in_features=1024, out_features=1024, bias=True)
              (key): Linear(in_features=1024, out_features=1024, bias=True)
              (value): Linear(in_features=1024, out_features=1024, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=1024, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): RobertaIntermediate(
            (dense): Linear(in_features=1024, out_features=4096, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): RobertaOutput(
            (dense): Linear(in_features=4096, out_features=1024, bias=True)
            (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
  )
  (classifier): RobertaClassificationHead(
    (dense): Linear(in_features=1024, out_features=1024, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    (out_proj): Linear(in_features=1024, out_features=3, bias=True)
  )
)

Usage

Example of use with transformers pipeline:

from transformers import pipeline

classifier = pipeline(model="radlab/polarity-3c", task="text-classification")

classifier("Text to classification")

with sample data and top_k=3:

classifier("""
  Po upadku re偶imu Asada w Syrii, mieszka艅cy, borykaj膮cy si臋 z ub贸stwem,
  zacz臋li t艂umnie poszukiwa膰 skarb贸w, zach臋ceni legendami o zakopanych
  bogactwach i dost臋pno艣ci膮 wykrywaczy metali, kt贸re sta艂y si臋 popularnym
  towarem. Mimo, 偶e dzia艂alno艣膰 ta jest nielegalna, rz膮d przymyka oko,
  a sprzedawcy oferuj膮 urz膮dzenia nawet dla dzieci. Poszukiwacze skupiaj膮
  si臋 na obszarach historycznych, wierz膮c w legendy o skarbach ukrytych
  przez staro偶ytne cywilizacje i wojska osma艅skie, cho膰 eksperci ostrzegaj膮
  przed fa艂szywymi monetami i kradzie偶膮 artefakt贸w z muze贸w.""",
  top_k=3
)

the output is:

[{'label': 'ambivalent', 'score': 0.9995126724243164},
 {'label': 'negative', 'score': 0.00024663121439516544},
 {'label': 'positive', 'score': 0.00024063512682914734}]