skweak: A software toolkit for weak supervision applied to NLP tasks

Overview

skweak: Weak supervision for NLP


skweak logo


Labelled data remains a scarce resource in many practical NLP scenarios. This is especially the case when working with resource-poor languages (or text domains), or when using task-specific labels without pre-existing datasets. The only available option is often to collect and annotate texts by hand, which is expensive and time-consuming.

skweak (pronounced /skwi:k/) is a Python-based software toolkit that provides a concrete solution to this problem using weak supervision. skweak is built around a very simple idea: Instead of annotating texts by hand, we define a set of labelling functions to automatically label our documents, and then aggregate their results to obtain a labelled version of our corpus.

The labelling functions may take various forms, such as domain-specific heuristics (like pattern-matching rules), gazetteers (based on large dictionaries), machine learning models, or even annotations from crowd-workers. The aggregation is done using a statistical model that automatically estimates the relative accuracy (and confusions) of each labelling function by comparing their predictions with one another.

skweak can be applied to both sequence labelling and text classification, and comes with a complete API that makes it possible to create, apply and aggregate labelling functions with just a few lines of code. The toolkit is also tightly integrated with SpaCy, which makes it easy to incorporate into existing NLP pipelines. Give it a try!


Full Paper:
Pierre Lison, Jeremy Barnes and Aliaksandr Hubin (2021), "skweak: Weak Supervision Made Easy for NLP", arXiv:2104.09683.

Documentation & API: See the Wiki for details on how to use skweak.


121_file_Video.mp4

Dependencies

  • spacy >= 3.0.0
  • hmmlearn >= 0.2.4
  • pandas >= 0.23
  • numpy >= 1.18

You also need Python >= 3.6.

Install

The easiest way to install skweak is through pip:

pip install skweak

or if you want to install from the repo:

pip install --user git+https://github.com/NorskRegnesentral/skweak

The above installation only includes the core library (not the additional examples in examples).

Basic Overview


Overview of skweak


Weak supervision with skweak goes through the following steps:

  • Start: First, you need raw (unlabelled) data from your text domain. skweak is build on top of SpaCy, and operates with Spacy Doc objects, so you first need to convert your documents to Doc objects using SpaCy.
  • Step 1: Then, we need to define a range of labelling functions that will take those documents and annotate spans with labels. Those labelling functions can comes from heuristics, gazetteers, machine learning models, etc. See the documentation for more details.
  • Step 2: Once the labelling functions have been applied to your corpus, you need to aggregate their results in order to obtain a single annotation layer (instead of the multiple, possibly conflicting annotations from the labelling functions). This is done in skweak using a generative model that automatically estimates the relative accuracy and possible confusions of each labelling function.
  • Step 3: Finally, based on those aggregated labels, we can train our final model. Step 2 gives us a labelled corpus that (probabilistically) aggregates the outputs of all labelling functions, and you can use this labelled data to estimate any kind of machine learning model. You are free to use whichever model/framework you prefer.

Quickstart

Here is a minimal example with three labelling functions (LFs) applied on a single document:

import spacy, re
from skweak import heuristics, gazetteers, aggregation, utils

# LF 1: heuristic to detect occurrences of MONEY entities
def money_detector(doc):
   for tok in doc[1:]:
      if tok.text[0].isdigit() and tok.nbor(-1).is_currency:
          yield tok.i-1, tok.i+1, "MONEY"
lf1 = heuristics.FunctionAnnotator("money", money_detector)

# LF 2: detection of years with a regex
lf2= heuristics.TokenConstraintAnnotator("years", lambda tok: re.match("(19|20)\d{2}$", tok.text), "DATE")

# LF 3: a gazetteer with a few names
NAMES = [("Barack", "Obama"), ("Donald", "Trump"), ("Joe", "Biden")]
trie = gazetteers.Trie(NAMES)
lf3 = gazetteers.GazetteerAnnotator("presidents", {"PERSON":trie})

# We create a corpus (here with a single text)
nlp = spacy.load("en_core_web_sm")
doc = nlp("Donald Trump paid $750 in federal income taxes in 2016")

# apply the labelling functions
doc = lf3(lf2(lf1(doc)))

# and aggregate them
hmm = aggregation.HMM("hmm", ["PERSON", "DATE", "MONEY"])
hmm.fit_and_aggregate([doc])

# we can then visualise the final result (in Jupyter)
utils.display_entities(doc, "hmm")

Obviously, to get the most out of skweak, you will need more than three labelling functions. And, most importantly, you will need a larger corpus including as many documents as possible from your domain, so that the model can derive good estimates of the relative accuracy of each labelling function.

Documentation

See the Wiki.

License

skweak is released under an MIT License.

The MIT License is a short and simple permissive license allowing both commercial and non-commercial use of the software. The only requirement is to preserve the copyright and license notices (see file License). Licensed works, modifications, and larger works may be distributed under different terms and without source code.

Citation

See our paper describing the framework:

Pierre Lison, Jeremy Barnes and Aliaksandr Hubin (2021), "skweak: Weak Supervision Made Easy for NLP", arXiv:2104.09683

@misc{lison2021skweak,
      title={skweak: Weak Supervision Made Easy for NLP}, 
      author={Pierre Lison and Jeremy Barnes and Aliaksandr Hubin},
      year={2021},
      eprint={2104.09683},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Comments
  • Label Function Analysis

    Label Function Analysis

    First of all, thanks for open sourcing such an awesome project!

    Our team has been playing around skweak for a sequential labeling task, and we were wondering if there were any plans in the roadmap to include tooling that helps practitioners understand the "impact" of their label functions statistically.

    Snorkel for example, provides a LF Analysis tool to understand how one's label functions apply to a dataset statistically (e.g., coverage, overlap, conflicts). Similar functionality would be tremendously helpful in gauging the efficacy of one's label functions for each class in a sequential labeling problem.

    Are there any plans to add such functionality down the line as a feature enhancement?

    enhancement 
    opened by schopra8 20
  • Tokens with no possible state

    Tokens with no possible state

    I very often get the error of this line that there is a "problem with token X", causing HMM training to be aborted after only a couple of documents in the very first iteration.

    I found out that this is due to framelogprob having all -np.inf for the token in question. So I checked what happens in self._compute_log_likelihood for the respective document and found that this document had only one labeling function firing and X[source] in this line was all False for the first token (or state?).

    This means that this token/state is also all masked with -np.inf in logsum in this line.

    Now, I am unsure how to fix that. This clearly does not look like the desired behavior but I suppose "testing for tokens with no possible states" is there for a reason. Can I simply replace -np.inf in self._compute_log_likelihood with -100000 ? Then, of course, the test will not fail and not abort training but there will be a token with only very improbable states. Is that ok?

    Or is that the wrong approach? Should tokens without observed labels from the labeling functions rather get a default label (e.g., O)? So why is that not done here? Is it a bug? I am not sure where I should look for a bug, if there is one. Can someone with a better knowledge of the code base give some advice on this?

    opened by mnschmit 10
  • _do_forward_pass, _do_backward_pass, _compute_posteriors not defined in skweak.aggregation

    _do_forward_pass, _do_backward_pass, _compute_posteriors not defined in skweak.aggregation

    skweak/aggregation.py", line 405, in fit logprob, fwdlattice = self._do_forward_pass(framelogprob) AttributeError: 'HMM' object has no attribute '_do_forward_pass'

    opened by ManuBohra 10
  • TypeError: unhashable type: 'list'

    TypeError: unhashable type: 'list'

    Upon applying config file in order to train textcat model using the following code:

    !spacy init config - --lang en --pipeline ner --optimize accuracy | \ spacy train - --paths.train ./train.spacy --paths.dev ./train.spacy \ --initialize.vectors en_core_web_md --output train

    I receive following error message:

    [i] Saving to output directory: train [i] Using CPU

    =========================== Initializing pipeline =========================== 2022-03-27 15:49:59.778883: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-03-27 15:49:59.778913: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-03-27 15:49:59.798942: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-03-27 15:49:59.798976: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. [2022-03-27 15:50:05,376] [INFO] Set up nlp object from config [2022-03-27 15:50:05,395] [INFO] Pipeline: ['tok2vec', 'ner'] [2022-03-27 15:50:05,395] [INFO] Created vocabulary [2022-03-27 15:50:07,968] [INFO] Added vectors: en_core_web_md [2022-03-27 15:50:08,292] [INFO] Finished initializing nlp object Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\Anaconda3\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\Scripts\spacy.exe_main.py", line 7, in File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\cli_util.py", line 71, in setup_cli command(prog_name=COMMAND) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\click\core.py", line 829, in call return self.main(*args, **kwargs) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\click\core.py", line 1259, in invoke return process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\typer\main.py", line 497, in wrapper return callback(**use_params) # type: ignore File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\cli\train.py", line 45, in train_cli train(config_path, output_path, use_gpu=use_gpu, overrides=overrides) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\cli\train.py", line 72, in train nlp = init_nlp(config, use_gpu=use_gpu) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\training\initialize.py", line 84, in init_nlp nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\language.py", line 1308, in initialize proc.initialize(get_examples, nlp=self, **p_settings) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\pipeline\tok2vec.py", line 215, in initialize validate_get_examples(get_examples, "Tok2Vec.initialize") File "spacy\training\example.pyx", line 65, in spacy.training.example.validate_get_examples File "spacy\training\example.pyx", line 44, in spacy.training.example.validate_examples File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\training\corpus.py", line 142, in call for real_eg in examples: File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\training\corpus.py", line 164, in make_examples for reference in reference_docs: File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\training\corpus.py", line 199, in read_docbin for doc in docs: File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\tokens_serialize.py", line 150, in get_docs doc.spans.from_bytes(self.span_groups[i]) File "C:\Users\49176\AppData\Roaming\Python\Python39\site-packages\spacy\tokens_dict_proxies.py", line 54, in from_bytes group = SpanGroup(doc).from_bytes(value_bytes) File "spacy\tokens\span_group.pyx", line 170, in spacy.tokens.span_group.SpanGroup.from_bytes File "C:\ProgramData\Anaconda3\lib\site-packages\srsly_msgpack_api.py", line 27, in msgpack_loads msg = msgpack.loads(data, raw=False, use_list=use_list) File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\msgpack_init.py", line 79, in unpackb return _unpackb(packed, **kwargs) File "srsly\msgpack_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb TypeError: unhashable type: 'list'

    Seems like a dependency issue. What is the reason for it? And is there a way to fix it?

    Also : Is the following error message a problem ? "[E1010] Unable to set entity information for token 10 which is included in more than one span in entities, blocked, missing or outside." or can it be avoided by simply applying the following?: for document in train_data: try: document.ents = document.spans["hmm"] skweak.utils.docbin_writer(train_data, "train.spacy") except Exception as e: print(e)

    opened by AlineBornschein 6
  • TypeError when nothing is found on in a document

    TypeError when nothing is found on in a document

    Hi! I'm getting an exception from fit_and_aggregate. TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'. The exception is from line 227 in aggregation.py, np.apply_along_axis(...)

    This seems to happen when all of my labeling functions return empty on one of the docs so the DataFrame is empty.

    opened by oholter 6
  • Error in MultilabelNaiveBayes

    Error in MultilabelNaiveBayes

    image

    I am using Skweak Multilabel for classification and I am getting the following error message - RuntimeError: No valid state found at position 0

    I aggregated LFs using CombinedAnnotator, then initialized MultilabelNaiveBayes - MultilabelNaiveBayes("skweak_preds",final_label_list) and then trained the model - skweak_model.fit(d2s)

    Any help in fixing this is appreciated. Thanks!

    opened by sujeethrv 5
  • Converting .spacy files to conll format to train other models on it.

    Converting .spacy files to conll format to train other models on it.

    Once I fit the aggregation model on the data, I used Skweak's function to write it as a Docbin file which will get saved as a .spacy file. How do I convert this into a normal CoNLL format file. Are there any libraries or tools that can do that ?

    opened by Akshay0799 5
  • Gazetteer is not working with single tokens

    Gazetteer is not working with single tokens

    Hello.

    Can't get why gazetteer doesn't match single name 'Barack'?

    import spacy, re
    from skweak import heuristics, gazetteers, aggregation, utils, base
    nlp = spacy.load("en_core_web_sm", disable=["ner"])
    doc = nlp('Barack Obama and Donald Trump')
    NAMES = [("Barack"), ("Donald", "Trump")]
    lf3 = gazetteers.GazetteerAnnotator("presidents", {"PERSON":gazetteers.Trie(NAMES)})
    doc = lf3(doc)
    print(doc.spans)
    

    {'presidents': [Donald Trump]}

    Any ideas?

    Thanks for a remarkable lib!

    opened by slavaGanzin 5
  • [Question] Underspecified Labels w/ out Fine-Grained Label

    [Question] Underspecified Labels w/ out Fine-Grained Label

    Context

    • I'm training an NER model using the HMM aggregator.
    • I have 2 label classes [A, B] and an under-specified label [C] which is a super-class of A and B within my ontology.
    • I have 3-sets of gazetteer label functions - one set for A, one set for B, and one set for C.

    Issue

    • When training the HMM, I have tokens which are annotated by label functions for C (superclass) but are not annotated by label functions for A and B (e.g., the term "Apple" is being labeled as an ENT but is not being captured by the LFs for PER or PROD).
    • Currently I'm calling the HMM function as follows:
    hmm = aggregation.HMM("hmm", [A, B], sequence_labelling=True)
    hmm.add_underspecified_label(C, [A, B])
    _ = hmm.fit_and_aggregate(annotated_docs)
    
    • This triggers an error from the below aggregation code, since all probability mass is being placed on a label that was not included in the HMM (i.e., the under-specified label C). https://github.com/NorskRegnesentral/skweak/blob/0613f20b9c8be3f22553e303ec22c72dea1f206a/skweak/aggregation.py#L397-L401

    Question(s)

    • Should I be including the under-specified label as a possible label option in the HMM?
    hmm = aggregation.HMM("hmm", [A, B, C], sequence_labelling=True)
    hmm.add_underspecified_label(C, [A, B])
    _ = hmm.fit_and_aggregate(annotated_docs)
    
    • How are underspecified labels "learned" or trained differently vs. the "specified labels" (e.g., A, B in the example)?

    Thanks in advance!

    opened by schopra8 5
  • use Flair with skweak

    use Flair with skweak

    hello , is here anyone who tried to implement another model/framework other than spacy (ner) as a labeling function. i tried to work with flair but didnt work. can anyone help me and thanks in advance .

    opened by Ihebzayen 4
  • Runtime error in display_entities

    Runtime error in display_entities

    I am using the latest version of skweak: 0.2.17. I tried running the example (quick-start.ipynb) in the repo. When I try to execute

    skweak.utils.display_entities(docs[28], "other_org_detector")

    , I get this error.

    image

    opened by latchukarthick98 3
  • Step by step NER alternative 2

    Step by step NER alternative 2

    Hello,

    First of all, thank you for the library.

    I'm kind of new to NER, and I'd like to know how the 2nd alternative of the NER process would be done, where a more sophisticated model is created, since I didn't find it in Step by Step NER.

    opened by boskis222 0
  • minimal example not working

    minimal example not working

    When I try to run the minimal example on the home page, an error appears: AttributeError: 'BaseHMM' object has no attribute '_do_forward_log_pass'

    Am I missing something from the install or is it just pip install skweak?

    opened by davidbetancur8 2
  • Support options in displacy.render

    Support options in displacy.render

    This is enhance request for display_entities can be a bit more flexible if you includeoptions={} as part of their parameters. Ex: def display_entities(doc: Doc, layer=None, add_tooltip=False, options={}):

    then fix the line below: html = spacy.displacy.render(doc2, jupyter=False, style="ent", manual=True, options=options)

    That will extends the functionality of render when creating new entities.

    Thanks for the great work with SKWEAK.

    opened by lidiexy-palinode 0
  • Support for relation extraction

    Support for relation extraction

    Right now, skweak supports two main types of NLP tasks: (token-level) sequence labelling and text classification. Both rests on the idea that labelling functions associate labels to text spans, and the role of the aggregation model is then to merge the outputs of those labelling functions such as to get unified predictions.

    However, some NLP tasks cannot be easily associated to text spans. For instance, relation extraction necessitates a prediction on pairs of spans.

    The question is then how to provide support for such type of tasks, for instance by implementing a RelationAnnotator that could be used to associate pairs of spans to a label.

    Technically speaking, we could still encode the annotations internally as SpanGroup objects. One solution would be to only add one span of the pair in the SpanGroup, but then specify that this span is connected to a second span (SpanGroup objects allows the inclusion of JSON-serialised attributes). The method get_observation_df in the BaseAggregator class could then be extended to detect whether a span is a normal one, or is connected to a second span. If that is the case, the aggregation would then be done on pairs of spans instead of single spans.

    Do get in touch if this functionality is something you need, so that we know whether we should prioritise this in our next release :-)

    enhancement 
    opened by plison 4
  • Regression-based outcome

    Regression-based outcome

    Hello, thank you for sharing this repo. Do you have plans for providing capability for a regression-based outcome? Something along the lines of fine-grained sentiment on a scale from 1-5?

    enhancement 
    opened by dmracek 1
Releases(0.3.1)
  • 0.3.1(Mar 25, 2022)

    Brand new version of skweak, including both a number of bug fixes and some new functionalities:

    1. skweak is now using the latest version of hmmlearn, thereby fixing a number errors due to a mismatch between method names
    2. We now have a clearer split between aggregation models for sequence labelling and for text classification. Possible aggregators for sequence labelling are SequentialMajorityVoter and HMM (preferred), while the aggregators for non-sequential text classification are MajorityVoter and NaiveBayes.

    We also introduce a brand new functionality: multi-label classification! Instead of assuming that all labels are mutually exclusive, you can now aggregate the results of labelling functions without assuming that only one label is correct. This multi-label scheme is available for both sequence labelling (see MultilabelSequentialMajorityVoter and MultilabelHMM) and text classification (see MultilabelMajorityVoter and MultilabelNaiveBayes).

    By default, all labels can be simultaneously true for a given data point, but you can enforce exclusivity relations between labels through the method set_exclusive_labels. If all labels are set to be mutually exclusive, the aggregation is equivalent to a standard multi-class setup. Internally, this functionality is implemented by constructing and fitting separate aggregation models for each label.

    The code for the aggregation models has also been heavily refactored, making it hopefully easier to create new aggregation models.

    Source code(tar.gz)
    Source code(zip)
Owner
Norsk Regnesentral (Norwegian Computing Center)
Norwegian Computing Center is a private foundation performing research in statistical modeling, machine learning and information/communication technology
Norsk Regnesentral (Norwegian Computing Center)
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Linear Transformers Are Secretly Fast Weight Programmers This repository contains the code accompanying the paper Linear Transformers Are Secretly Fas

Imanol Schlag 77 Dec 19, 2022
A sentence aligner for comparable corpora

About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur

Machinalis 128 Aug 24, 2022
PyWorld3 is a Python implementation of the World3 model

The World3 model revisited in Python Install & Hello World3 How to tune your own simulation Licence How to cite PyWorld3 with Bibtex References & ackn

Charles Vanwynsberghe 248 Dec 14, 2022
Rank-One Model Editing for Locating and Editing Factual Knowledge in GPT

Rank-One Model Editing (ROME) This repository provides an implementation of Rank-One Model Editing (ROME) on auto-regressive transformers (GPU-only).

Kevin Meng 130 Dec 21, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

A sample Python project A sample project that exists as an aid to the Python Packaging User Guide's Tutorial on Packaging and Distributing Projects. T

Python Packaging Authority 4.5k Dec 30, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

hyperuniversality investment opportunity: what if we could run multiple architectures in a single file, again apple universal binaries, but worse how

luna 2 Oct 19, 2021
Anomaly Detection 이상치 탐지 전처리 모듈

Anomaly Detection 시계열 데이터에 대한 이상치 탐지 1. Kernel Density Estimation을 활용한 이상치 탐지 train_data_path와 test_data_path에 존재하는 시점 정보를 포함하고 있는 csv 형태의 train data와

CLUST-consortium 43 Nov 28, 2022
We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Voice Based Personal Assistant We have built a Voice based Personal Assistant for people to access files hands free in their device using natural lang

Rushabh 2 Nov 13, 2021
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Kay Savetz 60 Dec 25, 2022
Understanding the Difficulty of Training Transformers

Admin Understanding the Difficulty of Training Transformers Guided by our analyses, we propose Adaptive Model Initialization (Admin), which successful

Liyuan Liu 300 Dec 29, 2022
Text to speech for Vietnamese, ez to use, ez to update

Chào mọi người, đây là dự án mở nhằm giúp việc đọc được trở nên dễ dàng hơn. Rất cảm ơn đội ngũ Zalo đã cung cấp hạ tầng để mình có thể tạo ra app này

Trần Cao Minh Bách 32 Jul 29, 2022
Yet Another Compiler Visualizer

yacv: Yet Another Compiler Visualizer yacv is a tool for visualizing various aspects of typical LL(1) and LR parsers. Check out demo on YouTube to see

Ashutosh Sathe 129 Dec 17, 2022
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Dec 30, 2022
Text Classification in Turkish Texts with Bert

You can watch the details of the project on my youtube channel Project Interface Project Second Interface Goal= Correctly guessing the classification

42 Dec 31, 2022
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from t

Kenneth Enevoldsen 32 Dec 29, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
Rootski - Full codebase for rootski.io (without the data)

📣 Welcome to the Rootski codebase! This is the codebase for the application run

Eric 20 Nov 18, 2022