Kogero
/

ms-marco-word2vec

continuous bag of words

Model card Files Files and versions Community

MS MARCO Word2Vec Embedding Model

This repository contains a Continuous Bag of Words (CBOW) Word2Vec model trained on the Microsoft MS MARCO dataset.

Model Details

Architecture: CBOW (Continuous Bag of Words)
Embedding Dimension: 128
Context Window Size: 4
Vocabulary Size: 50,001
Training Pairs: 6,618,785
Parameters: 12,800,256
Training Device: cuda

Usage

import torch

# Load the model
vocab_size = 50001
embed_dim = 128
model = CBOW(vocab_size=vocab_size, embed_dim=embed_dim)
model.load_state_dict(torch.load("cbow_model.pth"))

# Get embeddings for words
embeddings = model.embeddings.weight  # Shape: [vocab_size, embed_dim]

Training

This model was trained for 5 epochs with a batch size of 256 and learning rate of 0.003.

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Kogero/ms-marco-word2vec