Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Last update: Dec 16, 2022

Overview

Optimum Transformers

Accelerated NLP pipelines for fast inference 🚀 on CPU and GPU. Built with 🤗 Transformers, Optimum and ONNX runtime.

Installation:

With PyPI:

pip install optimum-transformers

Or directly from GitHub:

pip install git+https://github.com/AlekseyKorshuk/optimum-transformers

Usage:

The pipeline API is similar to transformers pipeline with just a few differences which are explained below.

Just provide the path/url to the model, and it'll download the model if needed from the hub and automatically create onnx graph and run inference.

from optimum_transformers import pipeline

# Initialize a pipeline by passing the task name and 
# set onnx to True (default value is also True)
nlp = pipeline("sentiment-analysis", use_onnx=True)
nlp("Transformers and onnx runtime is an awesome combo!")
# [{'label': 'POSITIVE', 'score': 0.999721109867096}]

Or provide a different model using the model argument.

from optimum_transformers import pipeline

nlp = pipeline("question-answering", model="deepset/roberta-base-squad2", use_onnx=True)
nlp(question="What is ONNX Runtime ?",
         context="ONNX Runtime is a highly performant single inference engine for multiple platforms and hardware")
# {'answer': 'highly performant single inference engine for multiple platforms and hardware', 'end': 94,
# 'score': 0.751201868057251, 'start': 18}

from optimum_transformers import pipeline

nlp = pipeline("ner", model="mys/electra-base-turkish-cased-ner", use_onnx=True, optimize=True,
                    grouped_entities=True)
nlp("adana kebap ülkemizin önemli lezzetlerinden biridir.")
# [{'entity_group': 'B-food', 'score': 0.869149774312973, 'word': 'adana kebap'}]

Set use_onnx to False for standard torch inference. Set optimize to True for quantize with ONNX. ( set use_onnx to True)

Supported pipelines

You can create Pipeline objects for the following down-stream tasks:

feature-extraction: Generates a tensor representation for the input sequence
ner and token-classification: Generates named entity mapping for each word in the input sequence.
sentiment-analysis: Gives the polarity (positive / negative) of the whole input sequence. Can be used for any text classification model.
question-answering: Provided some context and a question referring to the context, it will extract the answer to the question in the context.
text-classification: Classifies sequences according to a given number of classes from training.
zero-shot-classification: Classifies sequences according to a given number of classes directly in runtime.
fill-mask: The task of masking tokens in a sequence with a masking token, and prompting the model to fill that mask with an appropriate token.
text-generation: The task of generating text according to the previous text provided.

Calling the pipeline for the first time loads the model, creates the onnx graph, and caches it for future use. Due to this, the first load will take some time. Subsequent calls to the same model will load the onnx graph automatically from the cache.

Benchmarks

Note: For some reason, onnx is slow on colab notebook, so you won't notice any speed-up there. Benchmark it on your own hardware.

Check our example of benchmarking: example.

For detailed benchmarks and other information refer to this blog post and notebook.

Note: These results were collected on my local machine. So if you have high performance machine to benchmark, please contact me.

Benchmark sentiment-analysis pipeline

Benchmark zero-shot-classification pipeline

Benchmark token-classification pipeline

Benchmark question-answering pipeline

Benchmark fill-mask pipeline

About

Built by Aleksey Korshuk

🚀 If you want to contribute to this project OR create something cool together — contact me: link

Star this repository:

Resources

Inspired by Huggingface Infinity
First step done by Suraj Patil
Optimum
ONNX

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

8k Jan 4, 2023

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

16 Dec 22, 2022

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

sam4onnx A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for

6 May 15, 2022

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

PINTO_model_zoo Please read the contents of the LICENSE file located directly under each folder before using the model. My model conversion scripts ar

2.4k Jan 5, 2023

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

18 Nov 6, 2022

ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

14 Dec 9, 2022

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

sne4onnx A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or

10 Aug 30, 2022

Simple ONNX operation generator. Simple Operation Generator for ONNX.

sog4onnx Simple ONNX operation generator. Simple Operation Generator for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools Key concept V

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

8 Oct 13, 2022

v0.2.1-upd(Apr 1, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.1(Apr 1, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.0(Mar 31, 2022)

Source code(tar.gz)
Source code(zip)
1.0.1(Mar 26, 2022)
Full Changelog

Added text-generation pipeline

Source code(tar.gz)
Source code(zip)
1.0.0(Mar 24, 2022)

Source code(tar.gz)
Source code(zip)

Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.

Related tags

Overview

Optimum Transformers

Installation:

Usage:

Supported pipelines

Benchmarks

About

Resources

You might also like...

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

A very simple tool to rewrite parameters such as attributes and constants for OPs in ONNX models. Simple Attribute and Constant Modifier for ONNX.

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

Simple ONNX operation generator. Simple Operation Generator for ONNX.

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

Releases(v0.2.1-upd)

v0.2.1-upd(Apr 1, 2022)

v0.2.1(Apr 1, 2022)

v0.2.0(Mar 31, 2022)

1.0.1(Mar 26, 2022)

1.0.0(Mar 24, 2022)

Owner

Aleksey Korshuk

Code for the paper "Jukebox: A Generative Model for Music"

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

A configurable, tunable, and reproducible library for CTR prediction

Motion Reconstruction Code and Data for Skills from Videos (SFV)

Official PyTorch implementation of our AAAI22 paper: TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework via Self-Supervised Multi-Task Learning. Code will be available soon.

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Another pytorch implementation of FCN (Fully Convolutional Networks)

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

The Python ensemble sampling toolkit for affine-invariant MCMC

WPPNets: Unsupervised CNN Training with Wasserstein Patch Priors for Image Superresolution

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

Repositório criado para abrigar os notebooks com a listas de exercícios propostos pelo professor Gustavo Guanabara do canal Curso em Vídeo do YouTube durante o Curso de Python 3

RL-GAN: Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation

The reference baseline of final exam for XMU machine learning course

RP-GAN: Stable GAN Training with Random Projections

Image Segmentation and Object Detection in Pytorch

Machine learning library for fast and efficient Gaussian mixture models