A Python library for generating new text from existing samples.

Last update: May 17, 2022

Related tags

Overview

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birthday messages, horoscopes, Wikipedia articles, or the utterances of your game's NPCs. Everything works without an omnipotent "AI" - it is dead-simple code and therefore fast.

Check out the examples and feel free to contribute!

Installation

pip3 install remarkov

Example

Scrape the Wikipedia page for "Computer Programming" and generate a new text from it:

./tools/scrape-wiki.py Computer_programming | remarkov build | remarkov generate

You can also use remarkov programmatically:

from remarkov import create_model

model = create_model()
model.add_text("This is a sample text and this is another.")

print(model.generate().text())
# "This is a sample text and this is a sample text and this is a sample text ..."

Development

Make sure you run pytest as module. This will add the current directory to the import path:

python3 -m pytest

This project uses black for source code formatting:

black .

Generate documentation for the project (this uses the original pdoc at pdoc.dev):

git checkout gh-pages
pdoc -t pdoc/template -o public/docs <path_to_remarkov_module>

Run type checks using mypy:

mypy -p remarkov

Publishing is done like this (don't forget to bump the version in setup.py):

pip3 install twine # optional

git tag -a <version>
git push --tags

python3 setup.py clean --all
python3 setup.py sdist bdist_wheel
twine check "dist/*"
twine upload "dist/*"

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

21 Nov 3, 2022

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

112 Dec 13, 2022

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

1 Dec 28, 2021

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

4 Feb 9, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

5 Aug 30, 2022

Comments

Release schedule
[x] Add source code documentation

[x] Improve explanation on website

[x] Adapt syntax highlighting in docs

[x] Generate samples for showcase

[x] Articles

[x] Birthday

[x] Horoscope

[x] Utterance

[x] Enable gh-pages
opened by lausek 0

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)
ReMarkov Example Datasets - EN

Based on:

https://github.com/kavgan/OpinRank (Cars, Hotels)

https://github.com/dsnam/markovscope (Horoscopes)

https://github.com/hmi-utwente/video-game-text-corpora (NPC)

ReMarkov Wikipedia Scraper (Blockchain)

Source code(tar.gz)
Source code(zip)
remarkov-dataset.7z(6.16 MB)
remarkov-dataset.zip(9.05 MB)

A Python library for generating new text from existing samples.

Related tags

Overview

Installation

Example

Development

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

Comments

Release schedule

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)

ReMarkov Example Datasets - EN

Owner

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

Privacy-Preserving Portrait Matting [ACM MM-21]

Iris prediction model is used to classify iris species created julia's DecisionTree, DataFrames, JLD2, PlotlyJS and Statistics packages.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

NeurIPS 2021 Datasets and Benchmarks Track

A disassembler for the RP2040 Programmable I/O State-machine!

✂️ EyeLipCropper is a Python tool to crop eyes and mouth ROIs of the given video.

Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

The Balloon Learning Environment - flying stratospheric balloons with deep reinforcement learning.

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

MXNet implementation for: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

Stitch it in Time: GAN-Based Facial Editing of Real Videos

Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

Facestar dataset. High quality audio-visual recordings of human conversational speech.

【CVPR 2021, Variational Inference Framework, PyTorch】 From Rain Generation to Rain Removal