Toolkit for collecting and applying prompts

Overview

PromptSource

Promptsource is a toolkit for collecting and applying prompts to NLP datasets.

Promptsource uses a simple templating language to programatically map an example of a dataset into a text input and a text target.

Promptsource contains a growing collection of prompts (which we call P3: Public Pool of Prompts). As of October 18th, there are ~2'000 prompts for 170+ datasets in P3. Feel free to use these prompts as they are (you'll find citation details here).

Note that a subset of the prompts are still Work in Progress. You'll find the list of the prompts which will potentially be modified in the near future here. Modifications will in majority consist of metadata collection, but in some cases, will impact the templates themselves. To facilitate traceability, Promptsource is currently pinned at version 0.1.0.

Setup

  1. Download the repo
  2. Navigate to root directory of the repo
  3. Install requirements with pip install -r requirements.txt in a Python 3.7 environment

Running

You can browse through existing prompts on the hosted versiond of Promptsource.

If you want to launch a local version (in particular to write propmts, from the root directory of the repo, launch the editor with:

streamlit run promptsource/app.py

There are 3 modes in the app:

  • Helicopter view: aggregate high level metrics on the current state of the sourcing
  • Prompted dataset viewer: check the templates you wrote or already written on entire dataset
  • Sourcing: write new prompts

Running (read-only)

To host a public streamlit app, launch it with

streamlit run promptsource/app.py -- -r

Prompting an Example:

You can use Promptsource with Datasets to create prompted examples:

# Get an example
from datasets import load_dataset
dataset = load_dataset("ag_news")
example = dataset["train"][0]

# Prompt it
from promptsource.templates import TemplateCollection
# Get all the prompts
collection = TemplateCollection()
# Get all the AG News prompts
ag_news_prompts = collection.get_dataset("ag_news")
# Select a prompt by name
prompt = ag_news_prompts["classify_question_first"]

result = prompt.apply(example)
print("INPUT: ", result[0])
print("TARGET: ", result[1])

Contributing

Contribution guidelines and step-by-step HOW TO are described here.

Writing Prompts

A prompt is expressed in Jinja.

It is rendered using an example from the corresponding Hugging Face datasets library (a dictionary). The separator ||| should appear once to divide the template into input and target. Generally, the prompt should provide information on the desired behavior, e.g., text passage and instructions, and the output should be a desired response.

For more information, read the Contribution guidelines.

Known Issues

Warning or Error about Darwin on OS X: Try downgrading PyArrow to 3.0.0.

ConnectionRefusedError: [Errno 61] Connection refused: Happens occasionally. Try restarting the app.

Development structure

Promptsource was developed as part of the BigScience project for open research 🌸 , a year-long initiative targeting the study of large models and datasets. The goal of the project is to research language models in a public environment outside large technology companies. The project has 600 researchers from 50 countries and more than 250 institutions.

Citation

If you want to cite this P3 or Promptsource, you can use this bibtex:

@misc{sanh2021multitask,
      title={Multitask Prompted Training Enables Zero-Shot Task Generalization}, 
      author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush},
      year={2021},
      eprint={2110.08207},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Comments
  • Bias/fairness quantitative measurements

    Bias/fairness quantitative measurements

    Two goals:

    • have quantitative measurements for the paper's Broader impact section
    • reflect these in the model cards when we release checkpoints

    4 held out evaluation sets identified by the Evaluation WG:

    • jigsaw_toxicity_pred
    • crows_pairs
    • winogender (AXG in SuperGLUE)
    • winobias

    At the current state, I see crows_pairs and winobias prompted, jigsaw_toxicity_pred has an opened PR (#451) we need to check, winogender needs to be prompted.

    Workflow:

    • [x] prompt those that were not prompted yet
    • [x] making sure these were actually cached (@VictorSanh can have this caching step done fairly quickly)
    • [x] evaluation (normal and score rank evaluation) or maybe they have some special evaluation?
    • [x] when we know which checkpoints exactly to eval, get the final numbers to report
    opened by VictorSanh 22
  • remove language restrictions in tydiqa + add arabic prompts

    remove language restrictions in tydiqa + add arabic prompts

    • [x] Remove the if statement to allow English prompts to work across the Dataset.
    • [x] Add Arabic prompts for primary task subset with if statement to include only Arabic set.
    • [x] Add Arabic prompts for secondary task subset.
    opened by KhalidAlt 21
  • Tracking trainings and evals

    Tracking trainings and evals

    Main runs

    • D4 only (finetune-t5-xxl-lm-d4-091621-512)
      • [x] Training
      • [x] Eval
        • [x] Last
          • [x] Normal
          • [x] Rank
        • [x] 1'112'200 (half fine-tuning) (cf #456)
          • [x] Normal
          • [x] Rank
        • [x] 1'124'700 (second to last checkpoint)
          • [x] Normal
          • [x] Rank
      • [ ] SuperGLUE test (1380485) - RUNNING
      • [x] BigBench - see #469
    • D4 + GPT (finetune-t5-xxl-lm-d4-gpt-091621/)
      • [x] Training
      • [x] Eval
        • [x] Last
          • [x] Normal
          • [x] Rank
        • [x] 1'112'200 (half fine-tuning)
          • [x] Normal
          • [x] Rank
      • [x] BigBench - see #469 - RUNNING
    • D4 + GPT + Sglue (finetune-t5-xxl-lm-d4-all-091621)
      • [X] Training
      • [X] Eval
        • [x] Last
          • [x] Normal
          • [x] Rank
        • [x] 1'112'000 (half fine-tuning)
          • [x] Normal
          • [x] Rank
      • [x] BigBench - see #469 - RUNNING

    Ablations

    • Nb Template - 1 OG Template (finetune-t5-xxl-lm-d4-og-091621)
      • [X] Training
      • [X] Eval
        • [x] Last
          • [x] Normal
          • [x] Rank
        • [x] 1'112'000 (half fine-tuning)
          • [x] Normal
          • [x] Rank
    • all OG Templates (finetune-t5-xxl-lm-d4-og-all-091621) #465
      • [x] Training - RUNNING
      • [ ] Eval
        • [ ] Last
          • [ ] Normal
          • [ ] Rank
        • [x] 1'112'000 (half fine-tuning)
          • [ ] Normal ``
          • [x] Rank ``
    • Size XL (finetune-t5-xl-lm-d4-091621)
      • [X] Training
      • [X] Eval
        • [x] Last
          • [x] Normal
          • [x] Rank
        • [x] 1'112'000 (half fine-tuning)
          • [x] Normal
          • [x] Rank
    • D4 only duplicate (finetune-t5-xxl-lm-d4-091621/)
      • [x] Training
      • [ ] Eval
        • [ ] Last
          • [ ] Normal
          • [ ] Rank
        • [X] 1'112'000 (half fine-tuning)
          • [X] Normal
          • [X] Rank

    Baseline

    • T5 zero-shot baseline
      • [x] Eval - RUNNING under finetune-t5-xxl-lm-d4-091621
        • [x] Normal 1384362 - RUNNING
        • [x] Rank 1384360 - RUNNING
    opened by VictorSanh 18
  • Special eval metrics and scripts

    Special eval metrics and scripts

    Since we have the string outputs of all tasks, in principal we should be able to run arbitrary metrics, especially for datasets require fancy [email protected] has imported the official eval scripts for ReCoRD, SQuAD v2, Natural Questions, TriviaQA, and DROP.

    Update: Even when using Lintang's eval scripts, all extractive QAs and closed-book (generative) QAs still have abnormally low numbers, namely:

    • [ ] ReCoRD
    • [x] SQuAD2 (contains unanswerables)
    • [x] DROP
    • [ ] CoQA (contains unanswerables, multiple questions per example)
    • [ ] Quac (contains unanswerables, multiple questions per example)
    • [ ] Natural Questions
    • [x] TriviaQA
    • [x] WebQuestions

    Also, I think all eval of extractive QA from the training mixture also failed.

    (Note that ARC is closed-book, but its performance is fine because it's multiple-choice. A great point in case that machine task categories care more about format way more than human skill/knowledge.)

    Others with issues to keep an eye on:

    • [x] HellaSwag
    • [x] Lambada
    • [x] Winogrande
    evaluations 
    opened by awebson 14
  • Fix rendering: use simple `st.text` instead of `st.markdown`

    Fix rendering: use simple `st.text` instead of `st.markdown`

    The rendering is messed up in a bunch of places. A few issues that are symptomatic: #355 #326

    Replace st.markdown by st.text and extensively test that the rendering is better.

    opened by VictorSanh 13
  • Templates for `ncbi_disease`

    Templates for `ncbi_disease`

    This one compared to the others has been a real pain, but I think the templates are quite interesting. Not sure the Jinja style is top notch, but it seems functional on the example sets I checked.

    opened by drugilsberg 12
  • Creating unique identifier in the template.yaml

    Creating unique identifier in the template.yaml

    For now, it looks like we can sort of uniquely identify each template using a combination of template name and dataset name, but I'm expecting potential collisions when a lot of people start contributing. Besides, naming each template might not be useful (like if we end up with names like template1 template2 etc...), and it would help contributors if they don't have to add a name/check conflicts on the naming part before merging their template.yaml.

    I was thinking that we could add an ID to each entry by getting the hash of timestamp + dataset + string of prompt python function or jinja template? That should be more than enough to prevent collisions

    enhancement 
    opened by arnaudstiegler 12
  • Add prompts for DiaBLa dataset

    Add prompts for DiaBLa dataset

    Added a range of prompts for different tasks:

    • sentence-level translation
    • analogy-based translation
    • contextual translation (with context in different languages, automatically vs. manually translated)
    • detection of errors in translations
    • identification of machine translated output compared to human-produced translations
    opened by rbawden 11
  • AttributeError: 'NoneType' object has no attribute 'session_id'

    AttributeError: 'NoneType' object has no attribute 'session_id'

    Hi,

    Is anyone facing this issue while running streamlit run promptsource/app.py ?

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\I355109\Anaconda3\envs\python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
        exitcode = _main(fd)
      File "C:\Users\I355109\Anaconda3\envs\python37\lib\multiprocessing\spawn.py", line 114, in _main
        prepare(preparation_data)
      File "C:\Users\I355109\Anaconda3\envs\python37\lib\multiprocessing\spawn.py", line 225, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "C:\Users\I355109\Anaconda3\envs\python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
        run_name="__mp_main__")
      File "C:\Users\I355109\Anaconda3\envs\python37\lib\runpy.py", line 263, in run_path
        pkg_name=pkg_name, script_name=fname)
      File "C:\Users\I355109\Anaconda3\envs\python37\lib\runpy.py", line 96, in _run_module_code
        mod_name, mod_spec, pkg_name, script_name)
      File "C:\Users\I355109\Anaconda3\envs\python37\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "C:\Users\I355109\promptsource\promptsource\app.py", line 59, in <module>
        state = _get_state()
      File "c:\users\i355109\promptsource\promptsource\session.py", line 84, in _get_state
        session = _get_session()
      File "c:\users\i355109\promptsource\promptsource\session.py", line 74, in _get_session
        session_id = get_report_ctx().session_id
    AttributeError: 'NoneType' object has no attribute 'session_id'
    
    opened by manandey 11
  • Add conll2003 ner,pos,chunk task.

    Add conll2003 ner,pos,chunk task.

    Prompt Description:

    1. flat_question_with_label : Regular task. Label are normalized label in-case of POS tagging.
    2. flat_question_with_random_label : It is not expected that user will always provide labels strictly from the dataset. They may provide a subset of labels. So here we provided subset of labels. If the gold labels in the sample are not available in the subset we replace the gold label with "O". In case of choosing random tags, We always include "O" tag for ner, pos and chunk labels.
    3. flat_question_without_label : Regular task. No label is provided.

    POS label Normalization

    Both NER and Chunk task contains "O" tags. But POS doesn't contain "O" tag.

    In case of parts-of-speech tags, there are few labels that are weird in natural sense. For example see a prompt with all pos labels,

    Generate parts of speech from the following sentence. The parts of speech tags are ", '', #, $, (, ), ,, ., :, ``, CC, CD, DT, EX, FW, IN, JJ, JJR, JJS, LS, MD, NN, NNP, NNPS, NNS, NN|SYM, PDT, POS, PRP, PRP$, RB, RBR, RBS, RP, SYM, TO, UH, VB, VBD, VBG, VBN, VBP, VBZ, WDT, WP, WP$, WRB
    

    Here first 9 labels are normalized to "O" tag.

    Earlier Pull

    Earlier zip was not available so I wrote the a brute force code in O(n^2) complexity. But now that zip is available, I wrote the code with simpler notation and loop (with O(n) complexity). While merging I messed up in previous pull https://github.com/bigscience-workshop/promptsource/pull/170 . So I closed that and created the new pull.

    opened by sbmaruf 11
  • Some tweaks to the Editor

    Some tweaks to the Editor

    • Default sort by popularity and include number of prompts

    image

    • Global progress table

    image

    • Separated out new prompts and select, and the columns (I think this is okay, but might break)

    image

    • Added text wrapping so that you can view long prompts

    • Added a template viewer section

    image

    opened by srush 11
  • Possibility to use custom datasets

    Possibility to use custom datasets

    Hi - I have a custom dataset (e.g. https://huggingface.co/datasets/lauritowal/redefine-math) and would like to create prompts for it by using PromptSource:

    -Download the repo
    -Navigate to the root directory of the repo
    -Run pip install -e . to install the promptsource module
    

    It seems there is no option for loading a custom dataset into the tool via the Web-GUI. Maybe, I am missing something, but this would be a very helpful feature... (If there is a way to achieve this already, please let me know!)

    Thanks!

    opened by lauritowal 0
  • Merging xP3 & eval-hackathon

    Merging xP3 & eval-hackathon

    I would like to get all xP3 prompts merged into eval-hackathon & then have eval-hackathon be merged into main once & for all, but I'm not sure if you want that? cc @VictorSanh @stephenbach

    It would include adding:

    • Normal english prompts for various new & existing datasets (many PRs already open - will clean them up & request reviews when ready if someone gives me rights to do so / merge if i can & tests pass)
    • Long prompts https://github.com/bigscience-workshop/promptsource/pull/823/files ; Would request reviews once ready
    • Human-translated & Machine-translated prompts (Always suffixed with ht or mt, roughly 20K lines; see https://github.com/Muennighoff/promptsource/pull/47/files)

    Lmk if you're okay with that & I'll get started 🙂

    opened by Muennighoff 0
  • Regenerate all templates in unicode

    Regenerate all templates in unicode

    @KhalidAlt made a great suggestion to change

    yaml.dump(self.format_for_dump(), open(self.yaml_path, "w"))
    

    to

    yaml.dump(self.format_for_dump(), open(self.yaml_path, "w"), allow_unicode=True)
    

    in templates.py so that the yaml files display as unicode (rather than the unicode code points). We should definitely do this, but we should do it when the corpus is relatively stable and we can regenerate (read and write) all the yaml as one commit.

    opened by stephenbach 0
  • fix fields names for TREC after

    fix fields names for TREC after

    changes were induced here https://huggingface.co/datasets/trec/commit/1f97567bdd2adedefe8abdaa9bd6ee0e6725b458 to the field names (coarse_label and fine_label)

    opened by VictorSanh 0
Releases(v0.2.3)
Owner
BigScience Workshop
Research workshop on large language models - The Summer of Language Models 21
BigScience Workshop
Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

Overview Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify th

NelakurthiSudheer 2 Jan 03, 2022
[CVPR 2022] Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement

Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement Announcement 🔥 We have not tested the code yet. We will fini

Xiuwei Xu 7 Oct 30, 2022
Implementation of Vaswani, Ashish, et al. "Attention is all you need."

Attention Is All You Need Paper Implementation This is my from-scratch implementation of the original transformer architecture from the following pape

Brando Koch 195 Dec 30, 2022
Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Divide and Remaster Utility Tools Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper The DnR d

Darius Petermann 46 Dec 11, 2022
Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

transformer-from-scratch Code for my Medium blog post: Transformers from Scratch in PyTorch Note: This Transformer code does not include masked attent

Frank Odom 27 Dec 21, 2022
Music source separation is a task to separate audio recordings into individual sources

Music Source Separation Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmeme

Bytedance Inc. 958 Jan 03, 2023
ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

Katherine Crowson 53 Dec 29, 2022
Deployment of PyTorch chatbot with Flask

Chatbot Deployment with Flask and JavaScript In this tutorial we deploy the chatbot I created in this tutorial with Flask and JavaScript. This gives 2

Patrick Loeber (Python Engineer) 107 Dec 29, 2022
Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

Improving-Adversarial-Transferability-of-Vision-Transformers Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Fahad Khan, Fatih Porikli arxiv link A

Muzammal Naseer 47 Dec 02, 2022
The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

PyCIL: A Python Toolbox for Class-Incremental Learning Introduction • Methods Reproduced • Reproduced Results • How To Use • License • Acknowledgement

Fu-Yun Wang 258 Dec 31, 2022
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

Van 21 Dec 30, 2022
Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A method to solve the Higgs boson challenge using Least Squares - Novae This project is the Project 1 of EPFL CS-433 Machine Learning. The project is

Giacomo Orsi 1 Nov 09, 2021
An example of semantic segmentation using tensorflow in eager execution.

Semantic segmentation using Tensorflow eager execution Requirement Python 2.7+ Tensorflow-gpu OpenCv H5py Scikit-learn Numpy Imgaug Train with eager e

Iñigo Alonso Ruiz 25 Sep 29, 2022
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

COCO-LM This repository contains the scripts for fine-tuning COCO-LM pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: COCO-LM: Correcting an

Microsoft 106 Dec 12, 2022
The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

Quantum Qubit Rotation Algorithm Single qubit rotation gates $$ U(\Theta)=\bigotimes_{i=1}^n R_x (\phi_i) $$ QQRA for the max-cut problem This code wa

SheffieldWang 0 Oct 18, 2021
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话). Many modifications t

291 Jan 02, 2023
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 09, 2023
Project code for weakly supervised 3D object detectors using wide-baseline multi-view traffic camera data: WIBAM.

WIBAM (Work in progress) Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data 3D object dete

Matthew Howe 10 Aug 24, 2022
Add gui for YoloV5 using PyQt5

HEAD 更新2021.08.16 **添加图片和视频保存功能: 1.图片和视频按照当前系统时间进行命名 2.各自检测结果存放入output文件夹 3.摄像头检测的默认设备序号更改为0,减少调试报错 温馨提示: 1.项目放置在全英文路径下,防止项目报错 2.默认使用cpu进行检测,自

Ruihao Wang 65 Dec 27, 2022
BabelCalib: A Universal Approach to Calibrating Central Cameras. In ICCV (2021)

BabelCalib: A Universal Approach to Calibrating Central Cameras This repository contains the MATLAB implementation of the BabelCalib calibration frame

Yaroslava Lochman 55 Dec 30, 2022