Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: ko
|
3 |
+
tags:
|
4 |
+
- sentence-similarity
|
5 |
+
- roberta
|
6 |
+
datasets:
|
7 |
+
- klue
|
8 |
+
---
|
9 |
+
|
10 |
+
# KLUE RoBERTa base model for Sentence Embeddings
|
11 |
+
|
12 |
+
This is the `sentence-klue-roberta-base` model. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings.
|
13 |
+
|
14 |
+
The model is described in the paper [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
|
15 |
+
|
16 |
+
|
17 |
+
|
18 |
+
## Usage (Sentence-Transformers)
|
19 |
+
|
20 |
+
Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
|
21 |
+
|
22 |
+
```
|
23 |
+
pip install -U sentence-transformers
|
24 |
+
```
|
25 |
+
|
26 |
+
Then you can use the model like this:
|
27 |
+
|
28 |
+
```python
|
29 |
+
from sentence_transformers import SentenceTransformer, util
|
30 |
+
|
31 |
+
model = SentenceTransformer("Huffon/sentence-klue-roberta-base")
|
32 |
+
|
33 |
+
docs = [
|
34 |
+
"1992λ
7μ 8μΌ μν₯λ―Όμ κ°μλ μΆμ²μ ννλμμ μλ²μ§ μμ
μ κ³Ό μ΄λ¨Έλ κΈΈμμμ μ°¨λ¨μΌλ‘ νμ΄λ κ·Έκ³³μμ μλλ€.",
|
35 |
+
"νμ μν₯μ€μ΄λ€.",
|
36 |
+
"μΆμ² λΆμμ΄λ±νκ΅λ₯Ό μ‘Έμ
νκ³ , μΆμ² ννμ€νκ΅μ μ
νν ν 2νλ
λ μμ£Ό μ‘λ―Όκ΄μ€νκ΅ μΆκ΅¬λΆμ λ€μ΄κ°κΈ° μν΄ μ ννμ¬ μ‘Έμ
νμμΌλ©°, 2008λ
λΉμ FC μμΈμ U-18νμ΄μλ λλΆκ³ λ±νκ΅ μΆκ΅¬λΆμμ μ μ νλ μ€ λνμΆκ΅¬νν μ°μμ μ ν΄μΈμ ν νλ‘μ νΈμ μ λ°λμ΄ 2008λ
8μ λ
μΌ λΆλ°μ€λ¦¬κ°μ ν¨λΆλ₯΄ν¬ μ μλ
νμ μ
λ¨νμλ€.",
|
37 |
+
"ν¨λΆλ₯΄ν¬ μ μ€ν μ£Όμ 곡격μλ‘ 2008λ
6μ λ€λλλμμ μ΄λ¦° 4κ°κ΅ κ²½κΈ°μμ 4κ²μμ μΆμ , 3골μ ν°λ¨λ Έλ€.",
|
38 |
+
"1λ
κ°μ μ ν ν 2009λ
8μ νκ΅μΌλ‘ λμμ¨ ν 10μμ κ°λ§ν FIFA U-17 μλμ»΅μ μΆμ νμ¬ 3골μ ν°νΈλ¦¬λ©° νκ΅μ 8κ°μΌλ‘ μ΄λμλ€.",
|
39 |
+
"κ·Έν΄ 11μ ν¨λΆλ₯΄ν¬μ μ μ μ μλ
ν μ μ κ³μ½μ 체결νμμΌλ©° λ
μΌ U-19 리그 4κ²½κΈ° 2골μ λ£κ³ 2κ΅° 리그μ μΆμ μ μμνλ€.",
|
40 |
+
"λ
μΌ U-19 리그μμ μν₯λ―Όμ 11κ²½κΈ° 6골, 2λΆ λ¦¬κ·Έμμλ 6κ²½κΈ° 1골μ λ£μΌλ©° μ¬λ₯μ μΈμ λ°μ 2010λ
6μ 17μΈμ λμ΄λ‘ ν¨λΆλ₯΄ν¬μ 1κ΅° ν νλ ¨μ μ°Έκ°, ν리μμ¦ νμ½μΌλ‘ ν¨λΆλ₯΄ν¬μ μ μ κ³μ½μ ν ν 10μ 18μΈμ ν¨λΆλ₯΄ν¬ 1κ΅° μμμΌλ‘ λ
μΌ λΆλ°μ€λ¦¬κ°μ λ°λ·νμλ€.",
|
41 |
+
]
|
42 |
+
document_embeddings = model.encode(docs)
|
43 |
+
|
44 |
+
query = "μν₯λ―Όμ μ΄λ¦° λμ΄μ μ λ½μ μ§μΆνμλ€."
|
45 |
+
query_embedding = model.encode(query)
|
46 |
+
|
47 |
+
top_k = min(5, len(docs))
|
48 |
+
cos_scores = util.pytorch_cos_sim(query_embedding, document_embeddings)[0]
|
49 |
+
top_results = torch.topk(cos_scores, k=top_k)
|
50 |
+
|
51 |
+
print(f"μ
λ ₯ λ¬Έμ₯: {query}")
|
52 |
+
print(f"\n<μ
λ ₯ λ¬Έμ₯κ³Ό μ μ¬ν {top_k} κ°μ λ¬Έμ₯>\n")
|
53 |
+
|
54 |
+
for i, (score, idx) in enumerate(zip(top_results[0], top_results[1])):
|
55 |
+
print(f"{i+1}: {docs[idx]} {'(μ μ¬λ: {:.4f})'.format(score)}\n")
|
56 |
+
```
|