ahmedheakl
/

bert-resume-classification

Text Classification

Model card Files Files and versions Community

bert-resume-classification / README.md

ahmedheakl's picture

Update README.md

b5a7011 verified 12 months ago

|

history blame contribute delete

2.84 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- ahmedheakl/resume-atlas
	language:
	- en
	metrics:
	- accuracy
	- f1
	- recall
	- precision
	pipeline_tag: text-classification
	---

	# How to use

	In this example, we do an inference on a sample from our dataset (_ResumeAtlas_). You can increase `max_length` for more accurate predictions.

	```python
	!pip install datasets

	import numpy as np
	import torch
	from transformers import BertForSequenceClassification, BertTokenizer
	from datasets import load_dataset
	from sklearn import preprocessing

	dataset_id='ahmedheakl/resume-atlas'
	model_id='ahmedheakl/bert-resume-classification'
	label_column = "Category"
	num_labels=43
	output_attentions=False
	output_hidden_states=False
	do_lower_case=True
	add_special_tokens=True
	max_length=512
	pad_to_max_length=True
	return_attention_mask=True
	truncation=True

	ds = load_dataset(dataset_id, trust_remote_code=True)

	le = preprocessing.LabelEncoder()
	le.fit(ds['train'][label_column])


	tokenizer = BertTokenizer.from_pretrained(model_id, do_lower_case=do_lower_case)
	model = BertForSequenceClassification.from_pretrained(
	model_id,
	num_labels = num_labels,
	output_attentions = output_attentions,
	output_hidden_states = output_hidden_states,
	)

	model = model.to('cuda').eval()
	sent = ds['train'][0]['Text']

	encoded_dict = tokenizer.encode_plus(
	sent,
	add_special_tokens=add_special_tokens,
	max_length=max_length,
	pad_to_max_length=pad_to_max_length,
	return_attention_mask=return_attention_mask,
	return_tensors='pt',
	truncation=truncation,
	)
	input_ids = encoded_dict['input_ids'].to('cuda')
	attention_mask = encoded_dict['attention_mask'].to('cuda')

	outputs = model(
	input_ids,
	token_type_ids=None,
	attention_mask=attention_mask
	)

	label_id = np.argmax(outputs['logits'].cpu().detach().tolist(), axis=1)
	print(f'Predicted: {le.inverse_transform(label_id)[0]} \| Ground: {ds["train"][0][label_column]}')
	```

	# Model Card for Model ID

	Please see paper & code for more information:
	- https://github.com/noran-mohamed/Resume-Classification-Dataset
	- https://arxiv.org/abs/2406.18125


	## Citation

	BibTeX:
	```
	@article{heakl2024resumeatlas,
	title={ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models},
	author={Heakl, Ahmed and Mohamed, Youssef and Mohamed, Noran and Sharkaway, Ali and Zaky, Ahmed},
	journal={arXiv preprint arXiv:2406.18125},
	year={2024}
	}
	```

	APA:
	```
	Heakl, A., Mohamed, Y., Mohamed, N., Sharkaway, A., & Zaky, A. (2024). ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.18125
	```


	## Model Card Authors [optional]

	Email: [email protected]
	Linkedin: https://linkedin.com/in/ahmed-heakl