multilingual version of CatastroBERT
CatastroBERT a model for Extreme weather events detection in French text
This model aims to facilitate the detection of paragraphs or articles relevant to extreme weather events in French text. It is based on the camembert-base model and was trained on manually annotated data (articles summaries) from the Gazette de Lausanne archives collected by impresso
Model Description
Developed by: Lucas Nicolas
Language(s) (NLP): French
Finetuned from model : camembert-base (RoBERTa Checkpoint)
Repository: Check the CatastroBERT GitHub page for more usage examples and information.
Usage
In Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "epfl-dhlab/CatastroBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification(model_name)
def predict(text):
# Prepare the text data
inputs = tokenizer.encode_plus(
text,
None,
add_special_tokens=True,
return_token_type_ids=True,
padding=True,
max_length=512,
truncation=True,
return_tensors='pt'
)
ids = inputs['input_ids'].to('cuda' if torch.cuda.is_available() else 'cpu')
mask = inputs['attention_mask'].to('cuda' if torch.cuda.is_available() else 'cpu')
# Get predictions
with torch.no_grad():
outputs = model(ids, mask)
logits = outputs.logits
# Apply sigmoid function to get probabilities
probs = torch.sigmoid(logits).cpu().numpy()
# Return the probability of the class (1)
return probs[0][0]
#example usage
text = "Un violent ouragan du sud-ouest est passé cette nuit sur Lausanne."
print(f"Prediction: {predict(text)}")
Training Data
This model was trained on manually a manually annotated dataset (articles summaries) curated from the Gazette de Lausanne archives collected by the impresso project. The dataset is composed of 4500 articles summaries of which 3500 were used for training and 1000 for validation.
Environmental Impact
Carbon emissions estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: RTX 3090
- Hours used: 26
- Carbon Emitted: 0.07 kg CO2
- Downloads last month
- 16