japanese-mulan-base
This is a Japanese MuLan (Music-Language pretraining) model developed by LY Corporation. This model was trained on ~20k internal music-text pairs, and it is applicable to various music tasks including zero-shot music classification, text-to-music or music-to-text retrieval.
How to use
- Install packages
pip install transformers[torch] torchaudio sentence-transformers sentencepiece
- Run
import torch
import torch.nn.functional as F
import torchaudio
from transformers import AutoModel, AutoProcessor
HF_MODEL_PATH = "line-corporation/japanese-mulan-base"
model = AutoModel.from_pretrained(HF_MODEL_PATH, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(HF_MODEL_PATH, trust_remote_code=True)
url = "https://cdn.bensound.com/bensound-happyrock.mp3" # music by Bensound.com
waveform, sample_rate = torchaudio.load(url)
# stero to mono + unbatched to batched
waveform = waveform.mean(dim=0, keepdim=True)
labels = ["γγγ―", "γγγγγγ", "γΈγ£γΊ", "γ―γ©γ·γγ―"]
processor.eval()
model.eval()
with torch.no_grad():
music_feature = processor.get_music_feature(waveform, sample_rate=sample_rate)
text_feature = processor.get_text_feature(labels)
music_embedding = model.get_music_features(**music_feature)
text_embedding = model.get_text_features(**text_feature)
# batched to unbatched
music_embedding = music_embedding.squeeze(dim=0)
# NOTE: music_embedding is not normalized by L2 norm.
similarity = F.cosine_similarity(music_embedding, text_embedding, dim=-1)
label_index = torch.argmax(similarity, dim=-1)
label = labels[label_index.item()]
print("Estimated label:", label)
# Estimated label: γγγ―
Model architecture
The model uses an Audio Spectrogram Transformer (AST) as the music encoder and a GLuCoSE as the text encoder. The music encoder was initialized from official AST pretrained by AudioSet. The text encoder was initialized from pkshatech/GLuCoSE-base-ja.
Licenses
The Apache License, Version 2.0
Citation
@misc{clip-japanese-base,
title = {Japanese MuLan Base},
author={Takuya Hasumi and Yusuke Fujita}
url = {https://huggingface.co/line-corporation/japanese-mulan-base},
}
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support