nlpcommon is a python Open Source Toolkit for text classification.

Overview

nlpcommon

PyPI version Contributions welcome GitHub contributors License Apache 2.0 python_vesion GitHub issues Wechat Group

nlpcommon, Python Text Tool. Python3开发。

Guide

Feature

nlpcommon is a python Open Source Toolkit for text classification. The goal is to implement text analysis algorithm, so as to achieve the use in the production environment.

nlpcommon has the characteristics of clear algorithm, high performance and customizable corpus.

Functions:

Classifier

  • LogisticRegression
  • Random Forest
  • Decision Tree
  • K-Nearest Neighbours
  • Naive bayes
  • Xgboost
  • Support Vector Machine(SVM)
  • TextCNN
  • TextRNN_Att
  • Fasttext
  • Bert

Cluster

  • MiniBatchKmeans

While providing rich functions, nlpcommon internal modules adhere to low coupling, model adherence to inert loading, dictionary publication, and easy to use.

Install

  • Requirements and Installation
pip3 install nlpcommon

or

git clone https://github.com/shibing624/nlpcommon.git
cd nlpcommon
python3 setup.py install

Usage

data

Stopwrods

examples/base_demo.py:

import sys

sys.path.append('..')
from nlpcommon import stopwords

if __name__ == '__main__':
    print(len(stopwords), stopwords)

output:

2438 {'', '大家', '孰知', '至于', './', '知道', '二话没说', '一何', '从宽', 'especially' ... }

Contact

  • Issue(建议):GitHub issues
  • 邮件我:xuming: [email protected]
  • 微信我:加我微信号:xuming624, 进Python-NLP交流群,备注:姓名-公司名-NLP

Cite

如果你在研究中使用了nlpcommon,请按如下格式引用:

@software{nlpcommon,
  author = {Xu Ming},
  title = {nlpcommon: A Tool for Text NLP},
  year = {2021},
  url = {https://github.com/shibing624/nlpcommon},
}

License

授权协议为 The Apache License 2.0,可免费用做商业用途。请在产品说明中附加nlpcommon的链接和授权协议。

Contribute

项目代码还很粗糙,如果大家对代码有所改进,欢迎提交回本项目,在提交之前,注意以下两点:

  • tests添加相应的单元测试
  • 使用python setup.py test来运行所有单元测试,确保所有单测都是通过的

之后即可提交PR。

Reference

  • pytextclassifier
Owner
xuming
Researcher, Machine Learning Developer, Advertising Risk Control.
xuming
Russian GPT3 models.

Russian GPT-3 models (ruGPT3XL, ruGPT3Large, ruGPT3Medium, ruGPT3Small) trained with 2048 sequence length with sparse and dense attention blocks. We also provide Russian GPT-2 large model (ruGPT2Larg

Sberbank AI 1.6k Jan 05, 2023
Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

Serbian AI Society 3 Nov 22, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
中文无监督SimCSE Pytorch实现

A PyTorch implementation of unsupervised SimCSE SimCSE: Simple Contrastive Learning of Sentence Embeddings 1. 用法 无监督训练 python train_unsup.py ./data/ne

99 Dec 23, 2022
Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu

Happy N. Monday 2 Mar 17, 2022
AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

凌逆战 75 Dec 05, 2022
PG-19 Language Modelling Benchmark

PG-19 Language Modelling Benchmark This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Proje

DeepMind 161 Oct 30, 2022
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

Explosion 222 Dec 16, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 02, 2022
Neural-Machine-Translation - Implementation of revolutionary machine translation models

Neural Machine Translation Framework: PyTorch Repository contaning my implementa

Utkarsh Jain 1 Feb 17, 2022
Contains descriptions and code of the mini-projects developed in various programming languages

TexttoSpeechAndLanguageTranslator-project introduction A pleasant application where the client will be given buttons like play,reset and exit. The cli

Adarsh Reddy 1 Dec 22, 2021
Stanford CoreNLP provides a set of natural language analysis tools written in Java

Stanford CoreNLP Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and giv

Stanford NLP 8.8k Jan 07, 2023
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

Zeqiu (Ellen) Wu 10 Oct 21, 2022
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

What is MUSE? MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (16 languages) of Universal Sentence Encoder (USE). MUS

Dani El-Ayyass 47 Sep 05, 2022
This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

NERphilosophy 👋 Welcome to the github repository of my BsC thesis. This repository contains (not all) code from my project on Named Entity Recognitio

Ruben 1 Jan 27, 2022
Need: Image Search With Python

Need: Image Search The problem is that a user needs to search for a specific ima

Surya Komandooru 1 Dec 30, 2021
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 01, 2022
Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Workshop: Enterprise-Scale NLP with Hugging Face & Amazon SageMaker Earlier this year we announced a strategic collaboration with Amazon to make it ea

Philipp Schmid 161 Dec 16, 2022
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 06, 2023