uk_ner_web_trf_13class

Model description

uk_ner_web_trf_13class is a fine-tuned Roberta Large Ukrainian model that is ready to use for Named Entity Recognition and achieves a new SoA performance for the NER task for Ukrainian language.

It has a solid performance and has been trained to recognize thirteen types of entities:

  • ORG — a name of a company, brand, agency, organization, institution (including religious, informal, non-profit), party, people's association, or specific project like a conference, a music band, a TV program, etc. Example: UNESCO.
  • PERS — a person name where person may refer to humans, book characters, or humanoid creatures like vampires, ghosts, mermaids, etc. Example: Marquis de Sade.
  • LOC — a geographical name, including names of districts, villages, cities, states, counties, countries, continents, rivers, lakes, seas, oceans, mountains, etc. Example: Ukraine.
  • MON — a sum of money including the currency. Examples: $40, 1 mln hryvnias.
  • PCT — a percent value including the percent sign or the word "percent". Example: 10%.
  • DATE — a full or incomplete calendar date that may include a century, a year, a month, a day. Examples: last week, 10.12.1999.
  • TIME — a textual or numerical timestamp. Examples: half past six, 18:30.
  • PERIOD — a time period, which may consist of two dates. Examples: a few months, 2014-2015.
  • JOB — a job title. Examples: member of parliament, ophthalmologist.
  • DOC — a unique name of a document, including names of contracts, orders, bills, purchases. Example: procurement contract CW2244226.
  • QUANT — a quantity with the unit of measurement, such as weight, distance, size. Examples: 3 kilograms, a hundred miles.
  • ART (artifact) — a name of a human-made product, like a book, a song, a car, or a sandwich. Examples: Mona Lisa, iPhone.
  • MISC — any other entity not covered in the list above, like nam*s of holidays, websites, battles, wars, sports events, hurricanes, etc. Example: Black Friday.

The model was fine-tuned on the NER-UK 2.0 dataset, released by the lang-uk.

Another transformer-based model trained on 4 classes for the SpaCy is available here.

Citation

TBA

Copyright: Dmytro Chaplynskyi, Mariana Romanyshyn, lang-uk project, 2024

Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results