nlp-waseda
/

comet-gpt2-small-japanese

Text Generation

text-generation-inference

Model card Files Files and versions Community

comet-gpt2-small-japanese / README.md

tide525's picture

Update README.md

afd061f over 2 years ago

|

history blame contribute delete

2.05 kB

	---
	language: ja
	widget:
	- text: Ｘが部屋でゲームするxEffect
	---

	# COMET-GPT2 ja

	Finetuned GPT-2 on [ATOMIC ja](https://github.com/nlp-waseda/comet-atomic-ja) using a causal language modeling (CLM) objective.
	It was introduced in [this paper](https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/B2-5.pdf).

	### How to use

	You can use this model directly with a pipeline for text generation.
	Since the generation relies on some randomness, we set a seed for reproducibility:

	```python
	>>> from transformers import pipeline, set_seed
	>>> generator = pipeline('text-generation', model='nlp-waseda/comet-gpt2-small-japanese')
	>>> set_seed(42)
	>>> generator('Ｘが大学で勉強するxEffect', max_length=30, num_return_sequences=5, do_sample=True)

	[{'generated_text': 'Ｘが大学で勉強するxEffect X が単位を取る'},
	{'generated_text': 'Ｘが大学で勉強するxEffect X が就職する'},
	{'generated_text': 'Ｘが大学で勉強するxEffect X がテストで良い点をとる'},
	{'generated_text': 'Ｘが大学で勉強するxEffect X が単位を落とす'},
	{'generated_text': 'Ｘが大学で勉強するxEffect X が資格を取る'}]
	```

	### Preprocessing

	The texts are segmented into words using Juman++ and tokenized using SentencePiece.

	## Evaluation results

	The model achieves the following results:

	\| BLEU \| BERTScore \|
	\|:-----:\|:---------:\|
	\| 43.61 \| 87.56 \|

	### BibTeX entry and citation info

	```bibtex
	@InProceedings{ide_nlp2023_event,
	author = "井手竜也 and 村田栄樹 and 堀尾海斗 and 河原大輔 and 山崎天 and 李聖哲 and 新里顕大 and 佐藤敏紀",
	title = "人間と言語モデルに対するプロンプトを用いたゼロからのイベント常識知識グラフ構築",
	booktitle = "言語処理学会第29回年次大会",
	year = "2023",
	url = "https://www.anlp.jp/proceedings/annual_meeting/2023/pdf_dir/B2-5.pdf"
	note = "in Japanese"
	}
	```