Update README.md

b9780e3 verified 9 months ago

6.06 kB

	---
	library_name: transformers
	base_model: bert-base-chinese
	tags:
	- generated_from_trainer
	datasets:
	- real-jiakai/chinese-squadv2
	model-index:
	- name: chinese_squadv2
	results: []
	---

	# bert-base-chinese-finetuned-squadv2

	This model is a fine-tuned version of [bert-base-chinese](https://huggingface.co/bert-base-chinese) on the [Chinese SQuAD v2.0 dataset](https://huggingface.co/datasets/real-jiakai/chinese-squadv2).

	## Model Description

	This model is designed for Chinese question answering tasks, specifically for extractive QA where the answer must be extracted from a given context paragraph. It can handle both answerable and unanswerable questions, following the SQuAD v2.0 format.

	Key features:
	- Based on BERT-base Chinese architecture
	- Supports both answerable and unanswerable questions
	- Trained on Chinese question-answer pairs
	- Optimized for extractive question answering

	## Intended Uses & Limitations

	### Intended Uses
	- Chinese extractive question answering
	- Reading comprehension tasks
	- Information extraction from Chinese text
	- Automated question answering systems

	### Limitations
	- Performance is significantly better on unanswerable questions (76.65% accuracy) compared to answerable questions (36.41% accuracy)
	- Limited to extractive QA (cannot generate new answers)
	- May not perform well on domain-specific questions outside the training data
	- Designed for modern Chinese text, may not work well with classical Chinese or dialectal variations

	## Training and Evaluation Data

	The model was trained on the Chinese SQuAD v2.0 dataset, which contains:

	Training Set:
	- Total examples: 90,027
	- Answerable questions: 46,529
	- Unanswerable questions: 43,498

	Validation Set:
	- Total examples: 9,936
	- Answerable questions: 3,991
	- Unanswerable questions: 5,945

	## Training Procedure

	### Training Hyperparameters

	- Learning rate: 3e-05
	- Batch size: 12
	- Evaluation batch size: 8
	- Number of epochs: 5
	- Optimizer: AdamW (β1=0.9, β2=0.999, ε=1e-08)
	- Learning rate scheduler: Linear
	- Maximum sequence length: 384
	- Document stride: 128
	- Training device: CUDA-enabled GPU

	### Training Results

	Final evaluation metrics:
	- Overall Exact Match: 60.49%
	- Overall F1 Score: 60.54%
	- Answerable Questions:
	- Exact Match: 36.41%
	- F1 Score: 36.53%
	- Unanswerable Questions:
	- Exact Match: 76.65%
	- F1 Score: 76.65%

	### Framework Versions
	- Transformers: 4.47.0.dev0
	- PyTorch: 2.5.1+cu124
	- Datasets: 3.1.0
	- Tokenizers: 0.20.3

	## Usage

	```python
	from transformers import AutoModelForQuestionAnswering, AutoTokenizer
	import torch

	# Load model and tokenizer
	model_name = "real-jiakai/bert-base-chinese-finetuned-squadv2"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForQuestionAnswering.from_pretrained(model_name)

	def get_answer(question, context, threshold=0.0):
	# Tokenize input with maximum sequence length of 384
	inputs = tokenizer(
	question,
	context,
	return_tensors="pt",
	max_length=384,
	truncation=True
	)

	with torch.no_grad():
	outputs = model(**inputs)
	start_logits = outputs.start_logits[0]
	end_logits = outputs.end_logits[0]

	# Calculate null score (score for predicting no answer)
	null_score = start_logits[0].item() + end_logits[0].item()

	# Find the best non-null answer, excluding [CLS] position
	# Set logits at [CLS] position to negative infinity
	start_logits[0] = float('-inf')
	end_logits[0] = float('-inf')

	start_idx = torch.argmax(start_logits)
	end_idx = torch.argmax(end_logits)

	# Ensure end_idx is not less than start_idx
	if end_idx < start_idx:
	end_idx = start_idx

	answer_score = start_logits[start_idx].item() + end_logits[end_idx].item()

	# If null score is higher (beyond threshold), return "no answer"
	if null_score - answer_score > threshold:
	return "Question cannot be answered based on the given context."

	# Otherwise, return the extracted answer
	tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
	answer = tokenizer.convert_tokens_to_string(tokens[start_idx:end_idx+1])

	# Check if answer is empty or contains only special tokens
	if not answer.strip() or answer.strip() in ['[CLS]', '[SEP]']:
	return "Question cannot be answered based on the given context."

	return answer.strip()

	questions = [
	"本届第十五届珠海航展的亮点和主要展示内容是什么？",
	"珠海杀人案发生地点？"
	]

	context = '第十五届中国国际航空航天博览会（珠海航展）于2024年11月12日至17日在珠海国际航展中心举行。本届航展吸引了来自47个国家和地区的超过890家企业参展，展示了涵盖"陆、海、空、天、电、网"全领域的高精尖展品。其中，备受瞩目的中国空军"八一"飞行表演队和"红鹰"飞行表演队，以及俄罗斯"勇士"飞行表演队同台献技，为观众呈现了精彩的飞行表演。此外，本届航展还首次开辟了无人机、无人船演示区，展示了多款前沿科技产品。'

	for question in questions:
	answer = get_answer(question, context)
	print(f"问题: {question}")
	print(f"答案: {answer}")
	print("-" * 50)
	```

	## Limitations and Bias

	The model shows significant performance disparity between answerable and unanswerable questions, which might indicate:
	1. Dataset quality issues
	2. Potential translation artifacts in the Chinese version of SQuAD
	3. Imbalanced handling of answerable vs. unanswerable questions

	## Ethics & Responsible AI

	Users should be aware that:
	- The model may reflect biases present in the training data
	- Performance varies significantly based on question type
	- Results should be validated for critical applications
	- The model should not be used as the sole decision-maker in critical systems