Antibody Humanization Model for Variable Light Chain

This is a RoBERTa model trained from scratch for antibody humanization of Variable Light (VL) chain sequences using Masked Language Modeling (MLM).

Model Description

This model is trained on antibody light chain sequences (variable region) for humanization tasks. It can be used for antibody sequence analysis, humanization, and understanding of VL chain patterns.

Usage

from transformers import RobertaTokenizer, RobertaForMaskedLM

# Load tokenizer and model from Hugging Face
tokenizer = RobertaTokenizer.from_pretrained("hemantn/roberta-base-humAb-vl")
model = RobertaForMaskedLM.from_pretrained("hemantn/roberta-base-humAb-vl")

Using AnthroAb Python Package

For easier antibody humanization, you can use the AnthroAb Python package which provides a high-level interface for antibody humanization tasks. This package is available on PyPI and includes both VH and VL chain models.

Installation

pip install anthroab

Quick Usage

import anthroab

# Humanize a heavy chain sequence (VH)
vh_sequence= "**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS"
humanized_vh = anthroab.predict_masked(vh_sequence, 'H')
print(f"Humanized VH: {humanized_vh}")

# Humanize a light chain sequence (VL)
vl_sequence = "DIQMTQSPSSLSASV*DRVTITCRASQSISSYLNWYQQKPGKAPKLLIYSASTLASGVPSRFSGSGSGTDF*LTISSLQPEDFATYYCQQSYSTPRTFGQGTKVEIK"
humanized_vl = anthroab.predict_masked(vl_sequence, 'L')
print(f"Humanized VL: {humanized_vl}")

Features

  • Easy Installation: Install directly from PyPI with pip install anthroab
  • High-Level API: Simple functions for antibody humanization
  • Dual Chain Support: Separate models for VH and VL chains
  • Sequence Infilling: Fill masked positions with human-like residues
  • Mutation Suggestions: Get humanizing mutations for frameworks and CDRs
  • Embedding Generation: Create vector representations of antibody sequences

The AnthroAb package uses this RoBERTa model (hemantn/roberta-base-humAb-vl) for VL chain humanization along with a companion VL model for light chain processing.

Model Details

Architecture

  • Model: RoBERTa (trained from scratch)
  • Architecture: RobertaForMaskedLM
  • Model Type: Masked Language Model for antibody sequences

Specifications

  • Hidden Size: 768
  • Number of Layers: 12
  • Number of Attention Heads: 12
  • Intermediate Size: 3072
  • Max Position Embeddings: 145
  • Vocabulary Size: 25 tokens
  • Model Size: ~164 MB
Downloads last month
82
Safetensors
Model size
85.8M params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hemantn/roberta-base-humAb-vl

Finetuned
(1838)
this model