Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Last update: Dec 06, 2022

Related tags

Deep Learning LEBERT

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Arxiv link of the paper: https://arxiv.org/abs/2105.07148

Requirement

Python 3.7.0
Transformer 3.4.0
Numpy 1.18.5
Packaging 17.1
skicit-learn 0.23.2
torch 1.16.0+cu92
tqdm 4.50.2
multiprocess 0.70.10
tensorflow 2.3.1
tensorboardX 2.1
seqeval 1.2.1

Input Format

CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.

美   B-LOC  
国   E-LOC  
的   O  
华   B-PER  
莱   I-PER  
士   E-PER  

我   O  
跟   O  
他   O  
谈   O  
笑   O  
风   O  
生   O

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese BERT: https://cdn.huggingface.co/bert-base-chinese-pytorch_model.bin

Chinese word embedding:

Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz

Checkpoints and Shells

Directory Structure of data

berts
- bert
  - config.json
  - vocab.txt
  - pytorch_model.bin
dataset
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
vocab
- tencent_vocab.txt, the vocab of pre-trained word embedding table.
embedding
- word_embedding.txt
result
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
log

Run

1.Convert .char.bmes file to .json file, python3 to_json.py
2.run the shell, sh run_ner.sh

If you want to load my checkpoints, you need to make some revisions to your transformers.

My model is trained in distribution mode so it can not be directly loaded by single-GPU mode. You can follow the below steps to revise the transformers before load my checkpoints.

Enter the source code director of Transformer, cd source/transformers-master
Find the modeling_util.py, and positioned to about 995 lines
change the code as follows:
Compile the revised source code and install. python3 setup.py install

Cite

@misc{liu2021lexicon,
      title={Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter}, 
      author={Wei Liu and Xiyan Fu and Yue Zhang and Wenming Xiao},
      year={2021},
      eprint={2105.07148},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Related tags

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Requirement

Input Format

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese word embedding:

Checkpoints and Shells

Directory Structure of data

Run

If you want to load my checkpoints, you need to make some revisions to your transformers.

Cite

Owner

Practical Single-Image Super-Resolution Using Look-Up Table

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Dynamic Capacity Networks using Tensorflow

Implement of "Training deep neural networks via direct loss minimization" in PyTorch for 0-1 loss

E-RAFT: Dense Optical Flow from Event Cameras

Scaling Vision with Sparse Mixture of Experts

Pansharpening by convolutional neural networks in the full resolution framework

Hashformers is a framework for hashtag segmentation with transformers.

pytorch implementation of GPV-Pose

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

Reduce end to end training time from days to hours (or hours to minutes), and energy requirements/costs by an order of magnitude using coresets and data selection.

A SAT-based sudoku solver

Convert dog pictures into various painting styles. Try LimnPet

All the code and files related to the MI-Lab of UE19CS305 course in sem 5

CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum

HomoInterpGAN - Homomorphic Latent Space Interpolation for Unpaired Image-to-image Translation

CTF Challenge for CSAW Finals 2021

Analysis of Smiles through reservoir sampling & RDkit

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime