arvine111
/

japanese-grammar-classification

Text Classification

Model card Files Files and versions

japanese-grammar-classification / README.md

arvine111's picture

update README.md

ad0a839 verified 4 months ago

|

history blame contribute delete

1.86 kB

	---
	license: mit
	language:
	- ja
	metrics:
	- f1
	base_model:
	- tohoku-nlp/bert-large-japanese-v2
	pipeline_tag: text-classification
	tags:
	- japanese
	- grammar
	- classification
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This model is a fine-tuned version of tohoku-nlp/bert-large-japanese-v2 designed to perform multi-class classification of Japanese grammar points.
	It was trained on labeled data sourced from the 日本語文型辞典 (grammar dictionary) and augmented with synthetic examples generated by a large language model.

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	### Direct Use

	This model takes a Japanese sentence as input and predicts the most likely grammar point(s) used in that sentence. It can be integrated into language-learning applications, grammar checkers, or reading-assistant tools.

	TOC

	### Out-of-Scope Use

	- Machine translation or text generation tasks.
	- Understanding semantics beyond grammar point identification.


	## Finetune Details

	### Finetune Data

	Source: 日本語文型辞典 covering ~2400 grammar points.
	Augmentation: Synthetic sentences generated via a large language model to balance low-frequency grammar points (minimum 20 examples per point).

	### Finetune Procedure

	- Preprocessing: Tokenization with MeCab + Unidic lite; WordPiece subword encoding.
	- Batch size: 64
	- Max sequence length: 128 tokens
	- Optimizer: AdamW (learning rate = 3e-5, weight decay = 0.05)
	- Scheduler: Linear warmup of 20% steps, then linear decay
	- Epochs: 10
	- Mixed precision: Enabled (fp16)

	## Evaluation

	- Test set: Held-out sentences from dictionary and synthetic set (10% of total).
	- Metrics:
	- F1 Score (macro): 83.51%
	- Top2 F1 Score (macro): 94.96%