code-vs-nl

This model is a fine-tuned version of distilbert-base-uncased on bookcorpus for text and codeparrot/github-code for code datasets. It achieves the following results on the evaluation set:

Model description

As it's a finetuned model, it's architecture is same as distilbert-base-uncased for Sequence Classification

Can be used to classify documents into text and code

It is a mix of above two datasets, equally random sampled

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1 Score
0.5732	0.07	500	0.5658	0.9934	0.9934
0.5254	0.14	1000	0.5180	0.9951	0.9950

Safetensors

Model size

67M params

Tensor type

F32

Base model

Quantized

(40)

this model