Antibody Humanization Model for Variable Light Chain

This is a RoBERTa model trained from scratch for antibody humanization of Variable Light (VL) chain sequences using Masked Language Modeling (MLM).

Model Description

This model is trained on antibody light chain sequences (variable region) for humanization tasks. It can be used for antibody sequence analysis, humanization, and understanding of VL chain patterns.

Usage

from transformers import RobertaTokenizer, RobertaForMaskedLM

# Load tokenizer and model from Hugging Face
tokenizer = RobertaTokenizer.from_pretrained("hemantn/roberta-base-humAb-vl")
model = RobertaForMaskedLM.from_pretrained("hemantn/roberta-base-humAb-vl")

Using AnthroAb Python Package

For easier antibody humanization, you can use the AnthroAb Python package which provides a high-level interface for antibody humanization tasks. This package is available on PyPI and includes both VH and VL chain models.

Installation

pip install anthroab

Quick Usage

import anthroab

# Humanize a heavy chain sequence (VH)
vh_sequence= "**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS"
humanized_vh = anthroab.predict_masked(vh_sequence, 'H')
print(f"Humanized VH: {humanized_vh}")

# Humanize a light chain sequence (VL)
vl_sequence = "DIQMTQSPSSLSASV*DRVTITCRASQSISSYLNWYQQKPGKAPKLLIYSASTLASGVPSRFSGSGSGTDF*LTISSLQPEDFATYYCQQSYSTPRTFGQGTKVEIK"
humanized_vl = anthroab.predict_masked(vl_sequence, 'L')
print(f"Humanized VL: {humanized_vl}")

Features

Easy Installation: Install directly from PyPI with pip install anthroab
High-Level API: Simple functions for antibody humanization
Dual Chain Support: Separate models for VH and VL chains
Sequence Infilling: Fill masked positions with human-like residues
Mutation Suggestions: Get humanizing mutations for frameworks and CDRs
Embedding Generation: Create vector representations of antibody sequences

The AnthroAb package uses this RoBERTa model (hemantn/roberta-base-humAb-vl) for VL chain humanization along with a companion VL model for light chain processing.

Model Details

Architecture

Model: RoBERTa (trained from scratch)
Architecture: RobertaForMaskedLM
Model Type: Masked Language Model for antibody sequences

Specifications

Hidden Size: 768
Number of Layers: 12
Number of Attention Heads: 12
Intermediate Size: 3072
Max Position Embeddings: 145
Vocabulary Size: 25 tokens
Model Size: ~164 MB

hemantn
/

roberta-base-humAb-vl