onnx-japanese-quantization-imatrix / docs /troubleshooting.md

ONNX量子化モデルとノートブックの完全アップロード

0751253 verified 16 days ago

2.93 kB

	# トラブルシューティングガイド

	## よくある問題と解決策

	### 1. インストール関連

	#### 問題: ONNX Runtime のインストールエラー
	```bash
	ERROR: Could not find a version that satisfies the requirement onnxruntime
	```

	解決策:
	```bash
	pip install --upgrade pip
	pip install onnxruntime==1.16.3 # 特定バージョンを指定
	```

	### 2. 量子化関連

	#### 問題: "indices element out of data bounds"
	```
	IndexError: indices element out of data bounds, idx=44914 must be within the inclusive range [-32001,32000]
	```

	解決策:
	- キャリブレーションサンプル数を削減: `max_samples=20`
	- テキストの長さを制限: `text[:100]`
	- 語彙サイズの確認とトークナイザーの再設定

	#### 問題: "Got invalid dimensions for input"
	```
	Invalid Input: Got invalid dimensions for input: input_ids Expected: 1
	```

	解決策:
	- 元の非量子化ONNXモデルを使用
	- decoder_with_pastモデルの場合は1トークンのみを入力
	- モデル入力仕様の確認

	### 3. メモリ関連

	#### 問題: メモリ不足エラー
	```
	RuntimeError: CUDA out of memory
	```

	解決策:
	```python
	# バッチサイズを削減
	batch_size = 1
	# サンプル数を削減
	max_samples = 10
	# CPUを使用
	providers = ['CPUExecutionProvider']
	```

	### 4. Hugging Face関連

	#### 問題: データセット読み込みエラー
	```
	ConnectionError: Couldn't reach 'https://huggingface.co'
	```

	解決策:
	```python
	# オフラインモードを使用
	from datasets import load_dataset
	dataset = load_dataset("TFMC/imatrix-dataset-for-japanese-llm",
	split="train", download_mode="reuse_cache")
	```

	## デバッグのヒント

	### 1. ログレベルの設定
	```python
	import logging
	logging.basicConfig(level=logging.DEBUG)
	```

	### 2. モデル情報の確認
	```python
	import onnx
	model = onnx.load("your_model.onnx")
	print("Inputs:", [inp.name for inp in model.graph.input])
	print("Outputs:", [out.name for out in model.graph.output])
	```

	### 3. トークナイザーの確認
	```python
	from transformers import MarianTokenizer
	tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-ja-en")
	print("Vocab size:", len(tokenizer))
	```

	## パフォーマンス最適化

	### 1. CPU最適化
	```python
	session_options = ort.SessionOptions()
	session_options.intra_op_num_threads = 4
	session_options.inter_op_num_threads = 1
	session_options.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
	```

	### 2. GPU使用（CUDAが利用可能な場合）
	```python
	providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
	session = ort.InferenceSession(model_path, providers=providers)
	```

	## サポートリソース

	- [ONNX Runtime Documentation](https://onnxruntime.ai/docs/)
	- [Transformers Documentation](https://huggingface.co/docs/transformers/)
	- [GitHub Issues](https://github.com/your-repo/issues)