Mengzi Pretrained Models

Last update: Jan 04, 2023

Overview

中文 | English

Mengzi

尽管预训练语言模型在 NLP 的各个领域里得到了广泛的应用，但是其高昂的时间和算力成本依然是一个亟需解决的问题。这要求我们在一定的算力约束下，研发出各项指标更优的模型。

我们的目标不是追求更大的模型规模，而是轻量级但更强大，同时对部署和工业落地更友好的模型。

基于语言学信息融入和训练加速等方法，我们研发了 Mengzi 系列模型。由于与 BERT 保持一致的模型结构，Mengzi 模型可以快速替换现有的预训练模型。

详细的技术报告请参考:

Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese

快速上手

Mengzi-BERT

# 使用 Huggingface transformers 加载
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("Langboat/mengzi-bert-base")
model = BertModel.from_pretrained("Langboat/mengzi-bert-base")

Mengzi-T5

# 使用 Huggingface transformers 加载
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("Langboat/mengzi-t5-base")
model = T5ForConditionalGeneration.from_pretrained("Langboat/mengzi-t5-base")

Mengzi-Oscar

参考文档

依赖安装

pip install transformers

下游任务

CLUE 分数

Model	AFQMC	TNEWS	IFLYTEK	CMNLI	WSC	CSL	CMRC2018	C3	CHID
RoBERTa-wwm-ext	74.30	57.51	60.80	80.70	67.20	80.67	77.59	67.06	83.78
Mengzi-BERT-base	74.58	57.97	60.68	82.12	87.50	85.40	78.54	71.70	84.16

RoBERTa-wwm-ext 的分数来自 CLUE baseline

对应超参

Task	Learning rate	Batch size	Epochs
AFQMC	3e-5	32	10
TNEWS	3e-5	128	10
IFLYTEK	3e-5	64	10
CMNLI	3e-5	512	10
WSC	8e-6	64	50
CSL	5e-5	128	5
CMRC2018	5e-5	8	5
C3	1e-4	240	3
CHID	5e-5	256	5

下载链接

联系方式

微信讨论群

邮箱

wangyulong[at]chuangxin[dot]com

免责声明

该项目中的内容仅供技术研究参考，不作为任何结论性依据。使用者可以在许可证范围内任意使用该模型，但我们不对因使用该项目内容造成的直接或间接损失负责。技术报告中所呈现的实验结果仅表明在特定数据集和超参组合下的表现，并不能代表各个模型的本质。实验结果可能因随机数种子，计算设备而发生改变。

使用者以各种方式使用本模型（包括但不限于修改使用、直接使用、通过第三方使用）的过程中，不得以任何方式利用本模型直接或间接从事违反所属法域的法律法规、以及社会公德的行为。使用者需对自身行为负责，因使用本模型引发的一切纠纷，由使用者自行承担全部法律及连带责任。我们不承担任何法律及连带责任。

我们拥有对本免责声明的解释、修改及更新权。

文献引用

@misc{zhang2021mengzi,
      title={Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese}, 
      author={Zhuosheng Zhang and Hanqing Zhang and Keming Chen and Yuhang Guo and Jingyun Hua and Yulong Wang and Ming Zhou},
      year={2021},
      eprint={2110.06696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Mengzi Pretrained Models

Related tags

Overview

Mengzi

导航

快速上手

Mengzi-BERT

Mengzi-T5

Mengzi-Oscar

依赖安装

下游任务

CLUE 分数

对应超参

下载链接

联系方式

微信讨论群

邮箱

免责声明

文献引用

Owner

Langboat

CS_Final_Metal_surface_detection - This is a final project for CoderSchool Machine Learning bootcamp on 29/12/2021.

Parametric Contrastive Learning (ICCV2021)

Official code for the ICLR 2021 paper Neural ODE Processes

Contextualized Perturbation for Textual Adversarial Attack, NAACL 2021

[KDD 2021, Research Track] DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks

Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Code for EMNLP'21 paper "Types of Out-of-Distribution Texts and How to Detect Them"

Provide partial dates and retain the date precision through processing

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

Pre-trained NFNets with 99% of the accuracy of the official paper

torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

Code for the Image similarity challenge.

Object detection (YOLO) with pytorch, OpenCV and python

Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Neural Style and MSG-Net

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

A library for graph deep learning research