Weaviate demo with the text2vec-openai module

Last update: Nov 11, 2022

Overview

Weaviate demo with the text2vec-openai module

This repository contains an example of how to use the Weaviate text2vec-openai module. When using this demo dataset, Weaviate will vectorize the data and the queries based on OpenAI's Babbage model.

What is Weaviate?

Weaviate is an open-source, modular vector search engine. It works like any other database you're used to (it has full CRUD support, it's cloud-native, etc), but it is created around the concept of storing all data objects based on the vector representations (i.e., embeddings) of these data objects. Within Weaviate you can mix traditional, scalar search filters with vector search filters through its GraphQL-API.

Weaviate modules can be used to -among other things- vectorize the data objects you add to Weaviate. In this demo, the text2vec-openai module is used to vectorize all data using OpenAI's Babbage model.

You can read about Weaviate in more detail in the software docs.

About the Dataset

This dataset contains descriptions of 34,886 movies from around the world. The dataset is taken from Kaggle.

Run the setup

Before running this setup, make sure you have an OpenAPI ready, you can create one here.

0. Update you OpenAI API key

$ export OPENAI_APIKEY=YOUR_API_KEY

1. Run the container

Run the container:

$ docker-compose up -d

2. Import the data

After the container starts up, you can import the data by running:

# Install the Weaviate Python client
$ pip3 install -r requirements.txt
# Import the data with the format `./import.py {URL} {OPENAI RATE LIMIT}`
$ ./import.py http://localhost:8080 550

Note: because the OpenAI API comes with a rate limit, we have taken this into account for this demo dataset. If you work with your own dataset and you've requested an increase/removal of your rate limit, you can increase the import speed. You can read here how to do this.

3. Query the data

You can query the data via the GraphQL interface that's available in the Weaviate Console (under "Self Hosted Weaviate").

Or you can test the example queries below.

Example Query

Learn how to use the Get{} function of the Weaviate GraphQL-API here.

{
  Get {
    Movie(
      nearText: {
        concepts: ["Movie about Venice"]
      }
      where: {
        path: ["year"]
        operator: LessThan
        valueInt: 1950
      }
      limit: 5
    ) {
      title
      plot
      year
      director {
        ... on Director {
          name
        }
      }
      genre {
        ... on Genre {
          name
        }
      }
    }
  }
}

Weaviate demo with the text2vec-openai module

Related tags

Overview

Weaviate demo with the text2vec-openai module

What is Weaviate?

About the Dataset

Run the setup

0. Update you OpenAI API key

1. Run the container

2. Import the data

3. Query the data

Example Query

Owner

SeMI Technologies

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

Python wrapper for Stanford CoreNLP tools v3.4.1

The guide to tackle with the Text Summarization

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Google's Meena transformer chatbot implementation

The official repository of the ISBI 2022 KNIGHT Challenge

The ibet-Prime security token management system for ibet network.

Nmt - TensorFlow Neural Machine Translation Tutorial

Code voor mijn Master project omtrent VideoBERT

Random Directed Acyclic Graph Generator

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

CJK computer science terms comparison / 中日韓電腦科學術語對照 / 日中韓のコンピュータ科学の用語対照 / 한·중·일 전산학 용어 대조

XLNet: Generalized Autoregressive Pretraining for Language Understanding

AudioCLIP Extending CLIP to Image, Text and Audio

OCR을 이용하여 인원수를 인식 후 줌을 Kill 해줍니다

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

A python package to fine-tune transformer-based models for named entity recognition (NER).