The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Last update: Jan 28, 2022

Related tags

Text Data & NLP information_retrieval

Overview

Main Idea

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Setup

Download trained models

There are two models trained for spanish, a bi-encoder and a cross-encoder. These serve to make the retrieval system using the retrieve and rerank idea:

make setup
pip install -r requirements.txt

Basic usage

Setup Elasticsearch index with semantic vectors. For this step we supose that a set of json files is folder. Each json can contain several optional fields but need to contain id and text fiedlds.

from information_retrieval import SemanticEmbedder, CrossEncoder, Prepare, Search

data_folder = 'data/'
text_field = "texto_parrafo"
id_field = "id_parrafo"
elastic_index_name = "sentencias_2.0"

# Read the files, compute embeddings and upload them to elasticsearch
P = Prepare(data_folder, text_field, id_field, elastic_index_name)
P.prepare()

Make queries to retrieve documents:

from information_retrieval import SearchEngine

query = "la vida es bella"
S = SearchEngine(elastic_index_name)
S.retrieve(query) # Only semantic search

S.rerank(query) # Retrieve and rerank

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Related tags

Overview

Main Idea

Setup

Download trained models

Basic usage

Model architecture

Training

Finetuning

Owner

Sergio Arnaud Gomez

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Unsupervised Language Modeling at scale for robust sentiment classification

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

Predict an emoji that is associated with a text

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Fixes mojibake and other glitches in Unicode text, after the fact.

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Search with BERT vectors in Solr and Elasticsearch

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Header-only C++ HNSW implementation with python bindings

A simple version of DeTR

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Big Bird: Transformers for Longer Sequences

A linter to manage all your python exceptions and try/except blocks (limited only for those who like dinosaurs).

SimBERT升级版（SimBERTv2）！

Toward Model Interpretability in Medical NLP

Simple program that translates the name of files into English

MEDIALpy: MEDIcal Abbreviations Lookup in Python

华为商城抢购手机的Python脚本 Python script of Huawei Store snapping up mobile phones