|
--- |
|
language: ko |
|
tags: |
|
- roberta |
|
- sentence-transformers |
|
datasets: |
|
- klue |
|
--- |
|
|
|
# KLUE RoBERTa base model for Sentence Embeddings |
|
|
|
This is the `sentence-klue-roberta-base` model. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings. |
|
|
|
The model is described in the paper [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084) |
|
|
|
|
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed: |
|
|
|
``` |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can use the model like this: |
|
|
|
```python |
|
import torch |
|
from sentence_transformers import SentenceTransformer, util |
|
|
|
model = SentenceTransformer("Huffon/sentence-klue-roberta-base") |
|
|
|
docs = [ |
|
"1992λ
7μ 8μΌ μν₯λ―Όμ κ°μλ μΆμ²μ ννλμμ μλ²μ§ μμ
μ κ³Ό μ΄λ¨Έλ κΈΈμμμ μ°¨λ¨μΌλ‘ νμ΄λ κ·Έκ³³μμ μλλ€.", |
|
"νμ μν₯μ€μ΄λ€.", |
|
"μΆμ² λΆμμ΄λ±νκ΅λ₯Ό μ‘Έμ
νκ³ , μΆμ² ννμ€νκ΅μ μ
νν ν 2νλ
λ μμ£Ό μ‘λ―Όκ΄μ€νκ΅ μΆκ΅¬λΆμ λ€μ΄κ°κΈ° μν΄ μ ννμ¬ μ‘Έμ
νμμΌλ©°, 2008λ
λΉμ FC μμΈμ U-18νμ΄μλ λλΆκ³ λ±νκ΅ μΆκ΅¬λΆμμ μ μ νλ μ€ λνμΆκ΅¬νν μ°μμ μ ν΄μΈμ ν νλ‘μ νΈμ μ λ°λμ΄ 2008λ
8μ λ
μΌ λΆλ°μ€λ¦¬κ°μ ν¨λΆλ₯΄ν¬ μ μλ
νμ μ
λ¨νμλ€.", |
|
"ν¨λΆλ₯΄ν¬ μ μ€ν μ£Όμ 곡격μλ‘ 2008λ
6μ λ€λλλμμ μ΄λ¦° 4κ°κ΅ κ²½κΈ°μμ 4κ²μμ μΆμ , 3골μ ν°λ¨λ Έλ€.", |
|
"1λ
κ°μ μ ν ν 2009λ
8μ νκ΅μΌλ‘ λμμ¨ ν 10μμ κ°λ§ν FIFA U-17 μλμ»΅μ μΆμ νμ¬ 3골μ ν°νΈλ¦¬λ©° νκ΅μ 8κ°μΌλ‘ μ΄λμλ€.", |
|
"κ·Έν΄ 11μ ν¨λΆλ₯΄ν¬μ μ μ μ μλ
ν μ μ κ³μ½μ 체결νμμΌλ©° λ
μΌ U-19 리그 4κ²½κΈ° 2골μ λ£κ³ 2κ΅° 리그μ μΆμ μ μμνλ€.", |
|
"λ
μΌ U-19 리그μμ μν₯λ―Όμ 11κ²½κΈ° 6골, 2λΆ λ¦¬κ·Έμμλ 6κ²½κΈ° 1골μ λ£μΌλ©° μ¬λ₯μ μΈμ λ°μ 2010λ
6μ 17μΈμ λμ΄λ‘ ν¨λΆλ₯΄ν¬μ 1κ΅° ν νλ ¨μ μ°Έκ°, ν리μμ¦ νμ½μΌλ‘ ν¨λΆλ₯΄ν¬μ μ μ κ³μ½μ ν ν 10μ 18μΈμ ν¨λΆλ₯΄ν¬ 1κ΅° μμμΌλ‘ λ
μΌ λΆλ°μ€λ¦¬κ°μ λ°λ·νμλ€.", |
|
] |
|
document_embeddings = model.encode(docs) |
|
|
|
query = "μν₯λ―Όμ μ΄λ¦° λμ΄μ μ λ½μ μ§μΆνμλ€." |
|
query_embedding = model.encode(query) |
|
|
|
top_k = min(5, len(docs)) |
|
cos_scores = util.pytorch_cos_sim(query_embedding, document_embeddings)[0] |
|
top_results = torch.topk(cos_scores, k=top_k) |
|
|
|
print(f"μ
λ ₯ λ¬Έμ₯: {query}") |
|
print(f"<μ
λ ₯ λ¬Έμ₯κ³Ό μ μ¬ν {top_k} κ°μ λ¬Έμ₯>") |
|
|
|
for i, (score, idx) in enumerate(zip(top_results[0], top_results[1])): |
|
print(f"{i+1}: {docs[idx]} {'(μ μ¬λ: {:.4f})'.format(score)}") |
|
``` |