A proof-of-concept jupyter extension which converts english queries into relevant python code

Overview

Text2Code for Jupyter notebook

A proof-of-concept jupyter extension which converts english queries into relevant python code.

Blog post with more details:

Data analysis made easy: Text2Code for Jupyter notebook

Demo Video:

Text2Code for Jupyter notebook

Supported Operating Systems:

  • Ubuntu
  • macOS

Installation

NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.

pip uninstall mopp

CPU-only install:

For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.

export JUPYTER_TEXT2CODE_MODE="cpu"

GPU install dependencies:

sudo apt-get install libopenblas-dev libomp-dev

Installation commands:

git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main

Uninstallation:

pip uninstall jupyter-text2code

Usage Instructions:

  • Start Jupyter notebook server by running the following command: jupyter notebook
  • If you don't see Nbextensions tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
  • You can open the sample notebooks/ctds.ipynb notebook for testing
  • If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from tensorflow_hub
  • Click on the Terminal Icon which appears on the menu (to activate the extension)
  • Type "help" to see a list of currently supported commands in the repo
  • Watch Demo video for some examples

Docker containers for jupyter-text2code

We have published CPU and GPU images to docker hub with all dependencies pre-installed.

Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.
CPU image size: 1.51 GB
GPU image size: 2.56 GB

Model training:

Generate training data:

From a list of templates present at jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv, generate training data by running the following command:

cd scripts && python generate_training_data.py

This command will generate data for intent matching and NER(Named Entity Recognition).

Create intent index faiss

Use the generated data to create a intent-matcher using faiss.

cd scripts && python create_intent_index.py

Train NER model

cd scripts && python train_spacy_ner.py

Steps to add more intents:

  • Add more templates in ner_templates with a new intent_id
  • Generate training data. Modify generate_training_data.py if different generation techniques are needed or if introducing a new entity.
  • Train intent index
  • Train NER model
  • modify jupyter_text2code/jupyter_text2code_serverextension/__init__.py with new intent's condition and add actual code for the intent
  • Reinstall plugin by running: pip install .

TODO:

  • Publish Docker image
  • Refactor code and make it mode modular, remove duplicate code, etc
  • Add support for Windows
  • Add support for more commands
  • Improve intent detection and NER
  • Explore sentence Paraphrasing to generate higher-quality training data
  • Gather real-world variable names, library names as opposed to randomly generating them
  • Try NER with a transformer-based model
  • With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
  • Create a survey to collect linguistic data
  • Add Speech2Code support

Authored By:

Owner
DeepKlarity
DeepKlarity
Program that shows all the details of the given IP address. Build with Python and ipinfo.io API

ip-details This is a program that shows all the details of the given IP address. Build with Python and ipinfo.io API Usage To use this program, run th

4 Mar 01, 2022
Bacon - Band-limited Coordinate Networks for Multiscale Scene Representation

BACON: Band-limited Coordinate Networks for Multiscale Scene Representation Project Page | Video | Paper Official PyTorch implementation of BACON. BAC

Stanford Computational Imaging Lab 144 Dec 29, 2022
A Python framework for building geospatial web-applications

Hey there, this is Greppo... A Python framework for building geospatial web-applications. Greppo is an open-source Python framework that makes it easy

Greppo 304 Dec 27, 2022
scalable analysis of images and time series

thunder scalable analysis of image and time series analysis in python Thunder is an ecosystem of tools for the analysis of image and time series data

thunder-project 813 Dec 29, 2022
Raster-based Spatial Analysis for Python

🌍 xarray-spatial: Raster-Based Spatial Analysis in Python 📍 Fast, Accurate Python library for Raster Operations ⚡ Extensible with Numba ⏩ Scalable w

makepath 649 Jan 01, 2023
Helping data scientists better understand their datasets and models in text classification. With love from ServiceNow.

Azimuth, an open-source dataset and error analysis tool for text classification, with love from ServiceNow. Overview Azimuth is an open source applica

ServiceNow 145 Dec 23, 2022
Read and write rasters in parallel using Rasterio and Dask

dask-rasterio dask-rasterio provides some methods for reading and writing rasters in parallel using Rasterio and Dask arrays. Usage Read a multiband r

Dymaxion Labs 85 Aug 30, 2022
Implementation of Trajectory classes and functions built on top of GeoPandas

MovingPandas MovingPandas implements a Trajectory class and corresponding methods based on GeoPandas. Visit movingpandas.org for details! You can run

Anita Graser 897 Jan 01, 2023
Python bindings and utilities for GeoJSON

geojson This Python library contains: Functions for encoding and decoding GeoJSON formatted data Classes for all GeoJSON Objects An implementation of

Jazzband 765 Jan 06, 2023
Python library to visualize circular plasmid maps

Plasmidviewer Plasmidviewer is a Python library to visualize plasmid maps from GenBank. This library provides only the function to visualize circular

Mori Hideto 9 Dec 04, 2022
Solving the Traveling Salesman Problem using Self-Organizing Maps

Solving the Traveling Salesman Problem using Self-Organizing Maps This repository contains an implementation of a Self Organizing Map that can be used

Diego Vicente 3.1k Dec 31, 2022
A Jupyter - Leaflet.js bridge

ipyleaflet A Jupyter / Leaflet bridge enabling interactive maps in the Jupyter notebook. Usage Selecting a basemap for a leaflet map: Loading a geojso

Jupyter Widgets 1.3k Dec 27, 2022
This GUI app was created to show the detailed information about the weather in any city selected by user

WeatherApp Content Brief description Tools Features Hotkeys How it works Screenshots Ways to improve the project Installation Brief description This G

TheBugYouCantFix 5 Dec 30, 2022
A Python interface between Earth Engine and xarray

eexarray A Python interface between Earth Engine and xarray Description eexarray was built to make processing gridded, mesoscale time series data quic

Aaron Zuspan 159 Dec 23, 2022
A simple reverse geocoder that resolves a location to a country

Reverse Geocoder This repository holds a small web service that performs reverse geocoding to determine whether a user specified location is within th

4 Dec 25, 2021
Computer Vision in Python

Mahotas Python Computer Vision Library Mahotas is a library of fast computer vision algorithms (all implemented in C++ for speed) operating over numpy

Luis Pedro Coelho 792 Dec 20, 2022
🌐 Local tile server for viewing geospatial raster files with ipyleaflet

🌐 Local Tile Server for Geospatial Rasters Need to visualize a rather large raster (gigabytes) you have locally? This is for you. A Flask application

Bane Sullivan 192 Jan 04, 2023
Pure Python NetCDF file reader and writer

Pyncf Pure Python NetCDF file reading and writing. Introduction Inspired by the pyshp library, which provides simple pythonic and dependency free data

Karim Bahgat 14 Sep 30, 2022
A set of utility functions for working with GeoJSON annotations in Kaibu

kaibu-utils A set of utility functions for working with Kaibu. Create a new repository Create a new repository and select imjoy-team/imjoy-python-temp

ImJoy Team 0 Dec 12, 2021
Evaluation of file formats in the context of geo-referenced 3D geometries.

Geo-referenced Geometry File Formats Classic geometry file formats as .obj, .off, .ply, .stl or .dae do not support the utilization of coordinate syst

Advanced Information Systems and Technology 11 Mar 02, 2022