AI & ML interests
Natural Language Processing, Signal Processing
Recent Activity
View all activity
Latxa: An Open Language Model and Evaluation Suite for Basque
We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE.
Datasets and models for metaphor detection and interpretation via NLI in Spanish and English
-
Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection
Paper • 2210.10358 • Published -
HiTZ/cometa
Viewer • Updated • 3.63k • 14 -
HiTZ/xlm-roberta-large-metaphor-detection-es
Token Classification • Updated • 2 -
HiTZ/mdeberta-base-metaphor-detection-es
Token Classification • Updated • 2
Does Corpus Quality Really Matter for Low-Resource Languages?
Alpaca LoRA MT models and dataset
Basque Pretraining Datasets
Basque Instruction Datasets
OPT reward models
An open-source text-to-text multilingual model for the medical domain.
A Bilingual Corpus of Basque Parliamentary Transcriptions
Basque Speech to Text models
-
3
Demo Basque ASR
🎤 -
HiTZ/stt_eu_conformer_ctc_large
Automatic Speech Recognition • Updated • 6 • 2 -
HiTZ/stt_eu_conformer_transducer_large
Automatic Speech Recognition • Updated • 63 • 2 -
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Paper • 2503.23542 • Published • 10
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
IXA Submission for the 2024 ODESIA Challenge
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
-
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
Paper • 2506.07597 • Published -
HiTZ/Latxa-Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.67k • • 10 -
HiTZ/Latxa-Llama-3.1-70B-Instruct
Text Generation • 71B • Updated • 829 • 4 -
HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8
Text Generation • 71B • Updated • 826 • 1
Truth Knows No Language: Evaluating Truthfulness Beyond English
Ask2Transformers models
-
Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models
Paper • 2101.02661 • Published -
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction
Paper • 2109.03659 • Published -
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning
Paper • 2205.01376 • Published -
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations
Paper • 2203.13602 • Published • 1
Vision-Language Models Struggle to Align Entities across Modalities
On the Role of Morphological Information for Contextual Lemmatization
-
On the Role of Morphological Information for Contextual Lemmatization
Paper • 2302.00407 • Published -
HiTZ/xlm-roberta-large-lemma-eu
Token Classification • Updated • 55 -
HiTZ/xlm-roberta-large-lemma-en
Token Classification • Updated • 2 • 1 -
HiTZ/xlm-roberta-large-lemma-tr
Token Classification • Updated • 2
Basque Evaluation Datasets
Basque Encoder Language Models
-
ixa-ehu/roberta-eus-euscrawl-large-cased
Fill-Mask • 0.4B • Updated • 16 • 3 -
ixa-ehu/roberta-eus-euscrawl-base-cased
Fill-Mask • Updated • 20 • 2 -
ixa-ehu/roberta-eus-cc100-base-cased
Fill-Mask • 0.2B • Updated • 5 • 1 -
ixa-ehu/roberta-eus-mc4-base-cased
Fill-Mask • Updated • 6 • 1
State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models"
A Large Negation Benchmark to Challenge Large Language Models
Counternarrative Generation in Basque and Spanish
Give your Text Representation Models some Love: the Case for Basque
Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)
-
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine
Paper • 2306.06029 • Published -
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain
Paper • 2404.07613 • Published -
HiTZ/casimedicos-exp
Viewer • Updated • 2.49k • 94 • 3 -
HiTZ/casimedicos-squad
Preview • Updated • 16 • 1
XNLIeu: a dataset for cross-lingual NLI in Basque
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
-
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
Paper • 2506.07597 • Published -
HiTZ/Latxa-Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.67k • • 10 -
HiTZ/Latxa-Llama-3.1-70B-Instruct
Text Generation • 71B • Updated • 829 • 4 -
HiTZ/Latxa-Llama-3.1-70B-Instruct-FP8
Text Generation • 71B • Updated • 826 • 1
Latxa: An Open Language Model and Evaluation Suite for Basque
Truth Knows No Language: Evaluating Truthfulness Beyond English
We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE.
Ask2Transformers models
-
Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models
Paper • 2101.02661 • Published -
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction
Paper • 2109.03659 • Published -
Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning
Paper • 2205.01376 • Published -
ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations
Paper • 2203.13602 • Published • 1
Datasets and models for metaphor detection and interpretation via NLI in Spanish and English
-
Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection
Paper • 2210.10358 • Published -
HiTZ/cometa
Viewer • Updated • 3.63k • 14 -
HiTZ/xlm-roberta-large-metaphor-detection-es
Token Classification • Updated • 2 -
HiTZ/mdeberta-base-metaphor-detection-es
Token Classification • Updated • 2
Vision-Language Models Struggle to Align Entities across Modalities
Does Corpus Quality Really Matter for Low-Resource Languages?
Alpaca LoRA MT models and dataset
On the Role of Morphological Information for Contextual Lemmatization
-
On the Role of Morphological Information for Contextual Lemmatization
Paper • 2302.00407 • Published -
HiTZ/xlm-roberta-large-lemma-eu
Token Classification • Updated • 55 -
HiTZ/xlm-roberta-large-lemma-en
Token Classification • Updated • 2 • 1 -
HiTZ/xlm-roberta-large-lemma-tr
Token Classification • Updated • 2
Basque Pretraining Datasets
Basque Evaluation Datasets
Basque Instruction Datasets
Basque Encoder Language Models
-
ixa-ehu/roberta-eus-euscrawl-large-cased
Fill-Mask • 0.4B • Updated • 16 • 3 -
ixa-ehu/roberta-eus-euscrawl-base-cased
Fill-Mask • Updated • 20 • 2 -
ixa-ehu/roberta-eus-cc100-base-cased
Fill-Mask • 0.2B • Updated • 5 • 1 -
ixa-ehu/roberta-eus-mc4-base-cased
Fill-Mask • Updated • 6 • 1
OPT reward models
An open-source text-to-text multilingual model for the medical domain.
State-of-the-art encoder-only models for Spanish. From the paper "Lessons learned from the evaluation of Spanish Language Models"
A Bilingual Corpus of Basque Parliamentary Transcriptions
A Large Negation Benchmark to Challenge Large Language Models
Basque Speech to Text models
-
3
Demo Basque ASR
🎤 -
HiTZ/stt_eu_conformer_ctc_large
Automatic Speech Recognition • Updated • 6 • 2 -
HiTZ/stt_eu_conformer_transducer_large
Automatic Speech Recognition • Updated • 63 • 2 -
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Paper • 2503.23542 • Published • 10
Counternarrative Generation in Basque and Spanish
Give your Text Representation Models some Love: the Case for Basque
Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque
Data and models generated within the Antidote Project (https://univ-cotedazur.eu/antidote)
-
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine
Paper • 2306.06029 • Published -
Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain
Paper • 2404.07613 • Published -
HiTZ/casimedicos-exp
Viewer • Updated • 2.49k • 94 • 3 -
HiTZ/casimedicos-squad
Preview • Updated • 16 • 1
XNLIeu: a dataset for cross-lingual NLI in Basque
IXA Submission for the 2024 ODESIA Challenge