YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

中文预训练Longformer模型 | Longformer_ZH with PyTorch

相比于Transformer的O(n^2)复杂度,Longformer提供了一种以线性复杂度处理最长4K字符级别文档序列的方法。Longformer Attention包括了标准的自注意力与全局注意力机制,方便模型更好地学习超长序列的信息。

Compared with O(n^2) complexity for Transformer model, Longformer provides an efficient method for processing long-document level sequence in Linear complexity. Longformer’s attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.

我们注意到关于中文Longformer或超长序列任务的资源较少,因此在此开源了我们预训练的中文Longformer模型参数, 并提供了相应的加载方法,以及预训练脚本。

There are not so much resource for Chinese Longformer or long-sequence-level chinese task. Thus we open source our pretrained longformer model to help the researchers.

加载模型 | Load the model

您可以使用谷歌云盘或百度网盘下载我们的模型
You could get Longformer_zh from Google Drive or Baidu Yun.

我们同样提供了Huggingface的自动下载
We also provide auto load with HuggingFace.Transformers.

from Longformer_zh import LongformerZhForMaksedLM
LongformerZhForMaksedLM.from_pretrained('ValkyriaLenneth/longformer_zh')

注意事项 | Notice

  • 直接使用 transformers.LongformerModel.from_pretrained 加载模型

  • Please use transformers.LongformerModel.from_pretrained to load the model directly

  • 以下内容已经被弃用

  • The following notices are abondoned, please ignore them.

  • 区别于英文原版Longformer, 中文Longformer的基础是Roberta_zh模型,其本质上属于 Transformers.BertModel 而非 RobertaModel, 因此无法使用原版代码直接加载。

  • Different with origin English Longformer, Longformer_Zh is based on Roberta_zh which is a subclass of Transformers.BertModel not RobertaModel. Thus it is impossible to load it with origin code.

  • 我们提供了修改后的中文Longformer文件,您可以使用其加载参数。

  • We provide modified Longformer_zh class, you can use it directly to load the model.

  • 如果您想将此参数用于更多任务,请参考Longformer_zh.py替换Attention Layer.

  • If you want to use our model on more down-stream tasks, please refer to Longformer_zh.py and replace Attention layer with Longformer Attention layer.

关于预训练 | About Pretraining

效果测试 | Evaluation

CCF Sentiment Analysis

  • 由于中文超长文本级别任务稀缺,我们采用了CCF-Sentiment-Analysis任务进行测试
  • Since it is hard to acquire open-sourced long sequence level chinese NLP task, we use CCF-Sentiment-Analysis for evaluation.
Model Dev F
Bert 80.3
Bert-wwm-ext 80.5
Roberta-mid 80.5
Roberta-large 81.25
Longformer_SC 79.37
Longformer_ZH 80.51

Pretraining BPC

  • 我们提供了预训练BPC(bits-per-character), BPC越小,代表语言模型性能更优。可视作PPL.
  • We also provide BPC scores of pretraining, the lower BPC score, the better performance Langugage Model has. You can also treat it as PPL.
Model BPC
Longformer before training 14.78
Longformer after training 3.10

CMRC(Chinese Machine Reading Comprehension)

Model F1 EM
Bert 85.87 64.90
Roberta 86.45 66.57
Longformer_zh 86.15 66.84

Chinese Coreference Resolution

Model Conll-F1 Precision Recall
Bert 66.82 70.30 63.67
Roberta 67.77 69.28 66.32
Longformer_zh 67.81 70.13 65.64

致谢

感谢东京工业大学 奥村·船越研究室 提供算力。

Thanks Okumula·Funakoshi Lab from Tokyo Institute of Technology who provides the devices and oppotunity for me to finish this project.

Downloads last month
274
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.