LazarusNLP
/

all-indobert-base

Sentence Similarity

sentence-transformers

feature-extraction

text-embeddings-inference

Model card Files Files and versions Community

w11wo commited on Jan 25, 2024

Commit

9d0e4ed

·

verified ·

1 Parent(s): e4bb9e0

Update README.md

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -15,9 +15,11 @@ datasets:
 - SEACrowd/indolem_ntp
 - khalidalt/tydiqa-goldp
 - SEACrowd/facqa
 ---
-# LazarusNLP/all-indobert-base-v2
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
@@ -37,7 +39,7 @@ Then you can use the model like this:
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
-model = SentenceTransformer('LazarusNLP/all-indobert-base-v2')
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
@@ -63,8 +65,8 @@ def mean_pooling(model_output, attention_mask):
 sentences = ['This is an example sentence', 'Each sentence is converted']
 # Load model from HuggingFace Hub
-tokenizer = AutoTokenizer.from_pretrained('LazarusNLP/all-indobert-base-v2')
-model = AutoModel.from_pretrained('LazarusNLP/all-indobert-base-v2')
 # Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -86,7 +88,7 @@ print(sentence_embeddings)
 <!--- Describe how your model was evaluated -->
-For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=LazarusNLP/all-indobert-base-v2)
 ## Training
@@ -96,7 +98,7 @@ The model was trained with the parameters:
 `MultiDatasetDataLoader.MultiDatasetDataLoader` of length 352 with parameters:
 ```
-{'batch_size': 'unknown'}
 ```
 **Loss**:

 - SEACrowd/indolem_ntp
 - khalidalt/tydiqa-goldp
 - SEACrowd/facqa
+language:
+- ind
 ---
+# LazarusNLP/all-indobert-base
 This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 from sentence_transformers import SentenceTransformer
 sentences = ["This is an example sentence", "Each sentence is converted"]
+model = SentenceTransformer('LazarusNLP/all-indobert-base')
 embeddings = model.encode(sentences)
 print(embeddings)
 ```
 sentences = ['This is an example sentence', 'Each sentence is converted']
 # Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained('LazarusNLP/all-indobert-base')
+model = AutoModel.from_pretrained('LazarusNLP/all-indobert-base')
 # Tokenize sentences
 encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 <!--- Describe how your model was evaluated -->
+For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=LazarusNLP/all-indobert-base)
 ## Training
 `MultiDatasetDataLoader.MultiDatasetDataLoader` of length 352 with parameters:
 ```
+{'batch_size_pairs': 384, 'batch_size_triplets': 256}
 ```
 **Loss**: