README.md · radlab/polarity-3c at main

metadata

license: apache-2.0
language:
  - pl
base_model:
  - sdadas/polish-roberta-large-v2
pipeline_tag: text-classification
library_name: transformers
tags:
  - news

Description

polarity3c is a classification model that is specialized for determining the polarity of texts from news portals. It was learned mostly on texts in Polish.

Annotations from the plWordnet were used as the basis for the data. A pre-learned model on these annotations, served as the model in Human in the loop, to support the annotations for teaching the final model. The final model was learned on web content; data was manually collected and annotated.

As a model base, the sdadas/polish-roberta-large-v2 model was used with a classification head. More about model construction is on our blog.

Architecture

RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(128001, 1024, padding_idx=1)
      (position_embeddings): Embedding(514, 1024, padding_idx=1)
      (token_type_embeddings): Embedding(1, 1024)
      (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-23): 24 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSdpaSelfAttention(
              (query): Linear(in_features=1024, out_features=1024, bias=True)
              (key): Linear(in_features=1024, out_features=1024, bias=True)
              (value): Linear(in_features=1024, out_features=1024, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=1024, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): RobertaIntermediate(
            (dense): Linear(in_features=1024, out_features=4096, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): RobertaOutput(
            (dense): Linear(in_features=4096, out_features=1024, bias=True)
            (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
  )
  (classifier): RobertaClassificationHead(
    (dense): Linear(in_features=1024, out_features=1024, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
    (out_proj): Linear(in_features=1024, out_features=3, bias=True)
  )
)

Usage

Example of use with transformers pipeline:

from transformers import pipeline

classifier = pipeline(model="radlab/polarity-3c", task="text-classification")

classifier("Text to classification")

with sample data and top_k=3:

classifier("""
  Po upadku reżimu Asada w Syrii, mieszkańcy, borykający się z ubóstwem,
  zaczęli tłumnie poszukiwać skarbów, zachęceni legendami o zakopanych
  bogactwach i dostępnością wykrywaczy metali, które stały się popularnym
  towarem. Mimo, że działalność ta jest nielegalna, rząd przymyka oko,
  a sprzedawcy oferują urządzenia nawet dla dzieci. Poszukiwacze skupiają
  się na obszarach historycznych, wierząc w legendy o skarbach ukrytych
  przez starożytne cywilizacje i wojska osmańskie, choć eksperci ostrzegają
  przed fałszywymi monetami i kradzieżą artefaktów z muzeów.""",
  top_k=3
)

the output is:

[{'label': 'ambivalent', 'score': 0.9995126724243164},
 {'label': 'negative', 'score': 0.00024663121439516544},
 {'label': 'positive', 'score': 0.00024063512682914734}]