grammarBERT

grammarBERT is a specialized fine-tuning of codeBERT, using a Masked Language Modeling (MLM) task focused on derivation sequences specific to Python 3.8. By fine-tuning on Python’s Abstract Syntax Tree (AST) structures, grammarBERT combines codeBERT’s capabilities in natural language and code token handling with a unique focus on derivation sequences, enhancing performance for grammar-based programming tasks. This is particularly useful for applications requiring syntactic understanding, improved parsing accuracy, and context-aware code generation or transformation.

Model Overview

Base Model: codeBERT
Task: Masked Language Modeling on derivation sequences
Supported Language: Python 3.8
Applications: Parsing, code transformation, syntactic analysis, grammar-based programming

Model Usage

To use the grammarBERT model with Python 3.8-specific derivation sequences, load the model and tokenizer as shown below:

from transformers import RobertaForMaskedLM, RobertaTokenizer

# Load the pre-trained grammarBERT model and tokenizer
model = RobertaForMaskedLM.from_pretrained("Nbeau/grammarBERT")
tokenizer = RobertaTokenizer.from_pretrained("Nbeau/grammarBERT")

# Tokenize and prepare a code snippet
code_snippet = "def enumerate_items(items):"
# Convert code to a derivation sequence (requires `ast2seq` function)
derivation_sequence = ast2seq(code_snippet)  # `ast2seq` available at https://github.com/NathanaelBeau/grammarBERT/asdl/
input_ids = tokenizer.encode(derivation_sequence, return_tensors='pt')

# Use the model for masked token prediction or further fine-tuning
outputs = model(input_ids)

Training and Fine-Tuning

To train your own grammarBERT on a custom dataset or adapt it for different Python versions, follow the setup instructions in the grammarBERT GitHub repository. The repository provides detailed guidance for:

Preparing Python Abstract Syntax Tree (AST) sequences.
Configuring tokenization for derivation sequences.
Running training scripts for Masked Language Modeling (MLM) fine-tuning.

This setup allows for targeted fine-tuning on derivation sequences tailored to your specific grammar requirements.

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nbeau/grammarBERT

Base model

microsoft/codebert-base

Finetuned

(131)

this model

Nbeau
/

grammarBERT

grammarBERT

Model Overview

Model Usage

Training and Fine-Tuning

Model tree for Nbeau/grammarBERT

Dataset used to train Nbeau/grammarBERT