Huffon's picture
Update README.md
a5aca74
metadata
language: ko
tags:
  - roberta
  - sentence-transformers
datasets:
  - klue

KLUE RoBERTa base model for Sentence Embeddings

This is the sentence-klue-roberta-base model. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings.

The model is described in the paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Usage (Sentence-Transformers)

Using this model becomes more convenient when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

import torch
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("Huffon/sentence-klue-roberta-base")

docs = [
    "1992λ…„ 7μ›” 8일 손ν₯민은 강원도 μΆ˜μ²œμ‹œ ν›„ν‰λ™μ—μ„œ 아버지 손웅정과 μ–΄λ¨Έλ‹ˆ κΈΈμ€μžμ˜ μ°¨λ‚¨μœΌλ‘œ νƒœμ–΄λ‚˜ κ·Έκ³³μ—μ„œ μžλžλ‹€.",
    "ν˜•μ€ 손ν₯μœ€μ΄λ‹€.",
    "좘천 λΆ€μ•ˆμ΄ˆλ“±ν•™κ΅λ₯Ό μ‘Έμ—…ν–ˆκ³ , 좘천 후평쀑학ꡐ에 μž…ν•™ν•œ ν›„ 2ν•™λ…„λ•Œ 원주 μœ‘λ―Όκ΄€μ€‘ν•™κ΅ 좕ꡬ뢀에 λ“€μ–΄κ°€κΈ° μœ„ν•΄ μ „ν•™ν•˜μ—¬ μ‘Έμ—…ν•˜μ˜€μœΌλ©°, 2008λ…„ λ‹Ήμ‹œ FC μ„œμšΈμ˜ U-18νŒ€μ΄μ—ˆλ˜ 동뢁고등학ꡐ μΆ•κ΅¬λΆ€μ—μ„œ μ„ μˆ˜ ν™œλ™ 쀑 λŒ€ν•œμΆ•κ΅¬ν˜‘νšŒ μš°μˆ˜μ„ μˆ˜ ν•΄μ™Έμœ ν•™ ν”„λ‘œμ νŠΈμ— μ„ λ°œλ˜μ–΄ 2008λ…„ 8μ›” 독일 λΆ„λ°μŠ€λ¦¬κ°€μ˜ 함뢀λ₯΄ν¬ μœ μ†Œλ…„νŒ€μ— μž…λ‹¨ν•˜μ˜€λ‹€.",
    "함뢀λ₯΄ν¬ μœ μŠ€νŒ€ μ£Όμ „ 곡격수둜 2008λ…„ 6μ›” λ„€λœλž€λ“œμ—μ„œ μ—΄λ¦° 4개ꡭ κ²½κΈ°μ—μ„œ 4κ²Œμž„μ— μΆœμ „, 3골을 ν„°λœ¨λ Έλ‹€.",
    "1λ…„κ°„μ˜ μœ ν•™ ν›„ 2009λ…„ 8μ›” ν•œκ΅­μœΌλ‘œ λŒμ•„μ˜¨ ν›„ 10월에 κ°œλ§‰ν•œ FIFA U-17 μ›”λ“œμ»΅μ— μΆœμ „ν•˜μ—¬ 3골을 ν„°νŠΈλ¦¬λ©° ν•œκ΅­μ„ 8κ°•μœΌλ‘œ μ΄λŒμ—ˆλ‹€.",
    "κ·Έν•΄ 11μ›” 함뢀λ₯΄ν¬μ˜ 정식 μœ μ†Œλ…„νŒ€ μ„ μˆ˜ 계약을 μ²΄κ²°ν•˜μ˜€μœΌλ©° 독일 U-19 리그 4κ²½κΈ° 2골을 λ„£κ³  2κ΅° 리그에 μΆœμ „μ„ μ‹œμž‘ν–ˆλ‹€.",
    "독일 U-19 λ¦¬κ·Έμ—μ„œ 손ν₯민은 11κ²½κΈ° 6골, 2λΆ€ λ¦¬κ·Έμ—μ„œλŠ” 6κ²½κΈ° 1골을 λ„£μœΌλ©° 재λŠ₯을 인정받아 2010λ…„ 6μ›” 17μ„Έμ˜ λ‚˜μ΄λ‘œ 함뢀λ₯΄ν¬μ˜ 1κ΅° νŒ€ ν›ˆλ ¨μ— μ°Έκ°€, ν”„λ¦¬μ‹œμ¦Œ ν™œμ•½μœΌλ‘œ 함뢀λ₯΄ν¬μ™€ 정식 계약을 ν•œ ν›„ 10μ›” 18세에 함뢀λ₯΄ν¬ 1κ΅° μ†Œμ†μœΌλ‘œ 독일 λΆ„λ°μŠ€λ¦¬κ°€μ— λ°λ·”ν•˜μ˜€λ‹€.",
]
document_embeddings = model.encode(docs)

query = "손ν₯민은 μ–΄λ¦° λ‚˜μ΄μ— μœ λŸ½μ— μ§„μΆœν•˜μ˜€λ‹€."
query_embedding = model.encode(query)

top_k = min(5, len(docs))
cos_scores = util.pytorch_cos_sim(query_embedding, document_embeddings)[0]
top_results = torch.topk(cos_scores, k=top_k)

print(f"μž…λ ₯ λ¬Έμž₯: {query}")
print(f"<μž…λ ₯ λ¬Έμž₯κ³Ό μœ μ‚¬ν•œ {top_k} 개의 λ¬Έμž₯>")

for i, (score, idx) in enumerate(zip(top_results[0], top_results[1])):
    print(f"{i+1}: {docs[idx]} {'(μœ μ‚¬λ„: {:.4f})'.format(score)}")