Conversational text Analysis using various NLP techniques

Overview

PyConverse


Let me try first

Installation

pip install pyconverse

Usage

Please try this notebook that demos the core functionalities: basic usage notebook

Introduction

Conversation analytics plays an increasingly important role in shaping great customer experiences across various industries like finance/contact centres etc... primarily to gain a deeper understanding of the customers and to better serve their needs. This library, PyConverse is an attempt to provide tools & methods which can be used to gain an understanding of the conversations from multiple perspectives using various NLP techniques.

Why PyConverse?

I have been doing what can be called conversational text NLP with primarily contact centre data from various domains like Financial services, Banking, Insurance etc for the past year or so, and I have not come across any interesting open-source tools that can help in understanding conversational texts as such I decided to create this library that can provide various tools and methods to analyse calls and help answer important questions/compute important metrics that usually people want to find from conversations, in contact centre data analysis settings.

Where can I use PyConverse?

The primary use case is geared towards contact centre call analytics, but most of the tools that Converse provides can be used elsewhere as well.

There’s a lot of insights hidden in every single call that happens, Converse enables you to extract those insights and compute various kinds of KPIs from the point of Operational Efficiency, Agent Effectiveness & monitoring Customer Experience etc.

If you are looking to answer questions like these:-

  1. What was the overall sentiment of the conversation that was exhibited by the speakers?
  2. Was there periods of dead air(silence periods) between the agents and customer? if so how much?
  3. Was the agent empathetic towards the customer?
  4. What was the average agent response time/average hold time?
  5. What was being said on calls?

and more... pyconverse might be of small help.

What can PyConverse do?

At the moment pyconverse can do a few things that broadly fall into these categories:-

  1. Emotion identification
  2. Empathetic statement identification
  3. Call Segmentation
  4. Topic identification from call segments
  5. Compute various types of Speaker attributes:
    1. linguistic attributes like: word counts/number of words per utterance/negations etc.
    2. Identify periods of silence & interruptions.
    3. Question identification
    4. Backchannel identification
  6. Assess the overall nature of the speaker via linguistic attributes and tell if the Speaker is:
    1. Talkative, verbally fluent
    2. Informal/Personal/social
    3. Goal-oriented or Forward/future-looking/focused on past
    4. Identify inhibitions

What Next?

  1. Improve documentation.
  2. Add more use case notebooks/examples.
  3. Improve some of the functionalities and make it more streamlined.

Built with:

Transformers Spacy Pytorch

Credits:

Note: The backchannel Utterance classification method is inspired by facebook's Unsupervised Topic Segmentation of Meetings with BERT Embeddings paper (arXiv:2106.12978 [cs.LG])

You might also like...
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | δΈ­ζ–‡ Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing πŸŽ‰ πŸŽ‰ πŸŽ‰ We released the 2.0.0 version with TF2 Support. πŸŽ‰ πŸŽ‰ πŸŽ‰ If you

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing πŸŽ‰ πŸŽ‰ πŸŽ‰ We released the 2.0.0 version with TF2 Support. πŸŽ‰ πŸŽ‰ πŸŽ‰ If you

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

Various capabilities for static malware analysis.

Malchive The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to da

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

Comments
  • SemanticTextSegmentation NaN With All Stop Words

    SemanticTextSegmentation NaN With All Stop Words

    When running semantic text segmentation, I found that if the input utterance line is all stop words, (i.e. "Bye. Uh huh. Yeah."), SemanticTextSegmentation._get_similarity fails with ValueError: Input contains NaN.

    I found that adding a check for nan in both embeddings could solve this problem.

    def _get_similarity(self, text1, text2):
        sentence_1 = [i.text.strip()
                      for i in nlp(text1).sents if len(i.text.split(' ')) > 1]
        sentence_2 = [i.text.strip()
                      for i in nlp(text2).sents if len(i.text.split(' ')) > 2]
        embeding_1 = model.encode(sentence_1)
        embeding_2 = model.encode(sentence_2)
        embeding_1 = np.mean(embeding_1, axis=0).reshape(1, -1)
        embeding_2 = np.mean(embeding_2, axis=0).reshape(1, -1)
    
        if np.any(np.isnan(embeding_1)) or np.any(np.isnan(embeding_2)):
                return 1
    
        sim = cosine_similarity(embeding_1, embeding_2)
        return sim
    

    I would like to have someone else look at it because I don't want to make any assumptions that the stop words should be part of the same segments.

    opened by Haowjy 1
  • Updated  lru_cache decorator.

    Updated lru_cache decorator.

    After installing and running the library pyconverse on python-3.7 or below and using the import statement it gives error in import itself. I went through the utils file and saw that the "@lru_cache" decorator was written as per the new python(i.e. 3.8+) style hence when calling in older versions(py 3.7 and below it raises a NoneType Error) as the LRU_CACHE decorator is written as -" @lru_cache() " with paranthesis for older versions . Hence made the changes. The changes made do not cause any error on the newer versions.

    opened by AkashKhamkar 0
  • Error in importing Callyzer, SpeakerStats

    Error in importing Callyzer, SpeakerStats

    When I want to load the model it's showing this error.Whether it is currently in devloped mode des

    KeyError: "[E002] Can't find factory for 'tok2vec'. This usually happens when spaCy callsnlp.create_pipewith a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to Language.factories['tok2vec'] or remove it from the ### model meta and add it vianlp.add_pipeinstead.

    opened by kalpa277 0
Releases(v0.2.0)
  • v0.2.0(Nov 21, 2021)

    First Release of PyConverse library.

    Conversational Transcript Analysis using various NLP techniques.

    1. Emotion identification
    2. Empathetic statement identification
    3. Call Segmentation
    4. Topic identification from call segments
    5. Compute various types of Speaker attributes:
      • linguistic attributes like : word counts/number of words per utterance/negations etc
      • Identify periods of silence & interruptions.
      • Question identification
      • Backchannel identification
    6. Assess the overall nature of the speaker via linguistic attributes and tell if the Speaker is:
      • Talkative, verbally fluent
      • Informal/Personal/social
      • Goal-oriented or Forward/future-looking/focused on past
      • Identify inhibitions
    Source code(tar.gz)
    Source code(zip)
Owner
Rita Anjana
ML engineer
Rita Anjana
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
Optimal Transport Tools (OTT), A toolbox for all things Wasserstein.

Optimal Transport Tools (OTT), A toolbox for all things Wasserstein. See full documentation for detailed info on the toolbox. The goal of OTT is to pr

OTT-JAX 255 Dec 26, 2022
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
Code for using and evaluating SpanBERT.

SpanBERT This repository contains code and models for the paper: SpanBERT: Improving Pre-training by Representing and Predicting Spans. If you prefer

Meta Research 798 Dec 30, 2022
PyWorld3 is a Python implementation of the World3 model

The World3 model revisited in Python Install & Hello World3 How to tune your own simulation Licence How to cite PyWorld3 with Bibtex References & ackn

Charles Vanwynsberghe 248 Dec 14, 2022
TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in

Laboratory for Social Machines 84 Dec 20, 2022
HAIS_2GNN: 3D Visual Grounding with Graph and Attention

HAIS_2GNN: 3D Visual Grounding with Graph and Attention This repository is for the HAIS_2GNN research project. Tao Gu, Yue Chen Introduction The motiv

Yue Chen 1 Nov 26, 2022
Predict the spans of toxic posts that were responsible for the toxic label of the posts

toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant

Ilias Antonopoulos 3 Jul 24, 2022
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Google Research Datasets 740 Dec 24, 2022
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | δΈ­ζ–‡ Features 🌍 Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 06, 2023
Backend for the Autocomplete platform. An AI assisted coding platform.

Introduction A custom predictor allows you to deploy your own prediction implementation, useful when the existing serving implementations don't fit yo

Tatenda Christopher Chinyamakobvu 1 Jan 31, 2022
Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

Lynn Zheng 231 Dec 30, 2022
Repository for the paper "Optimal Subarchitecture Extraction for BERT"

Bort Companion code for the paper "Optimal Subarchitecture Extraction for BERT." Bort is an optimal subset of architectural parameters for the BERT ar

Alexa 461 Nov 21, 2022
A CSRankings-like index for speech researchers

Speech Rankings This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possib

Mutian He 19 Nov 26, 2022
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Damian Panek 176 Nov 28, 2022
Milaan Parmar / Милан ΠΏΠ°Ρ€ΠΌΠ°Ρ€ / _η±³ε…° 帕尔马 170 Dec 13, 2022
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook Β· Pre-trained Models Β· Report Bug Overview OpenAI recently released the pa

Fredrik Carlsson 481 Dec 30, 2022
Creating a chess engine using GPT-3

GPT3Chess Creating a chess engine using GPT-3 Code for my article : https://towardsdatascience.com/gpt-3-play-chess-d123a96096a9 My game (white) vs GP

19 Dec 17, 2022
Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Natural Language Processing for Adverse Drug Reaction (ADR) Detection This repo contains code from a project to identify ADRs in discharge summaries a

Medicines Optimisation Service - Austin Health 21 Aug 05, 2022
Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer. Designed for use with Argos Translate and LibreTranslate.

Argos Open Tech 61 Dec 13, 2022