(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Last update: Jul 01, 2022

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

We provide the source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts" accepted at ACL'22. If you find the code useful, please cite the following paper.

@inproceedings{song-etal-2022-grounded,
    title="Towards Abstractive Grounded Summarization of Podcast Transcripts",
    author = "Song, Kaiqiang and
              Li, Chen and
              Wang, Xiaoyang and
              Yu, Dong and
              Liu, Fei",
    booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
    year={2022}
}

Goal

We proposed a grounded summarization system, which provide each summary sentence a linked chunk of the original transcripts and their audio/video recordings. It allows a human evaluator to quickly verify the summary content against source clips.

News

03/04/2022 Trained model and processed testing data released.
03/03/2022 Code Released. Paper link, trained model and processed testing data will be released soon.
02/23/2022 Paper accepted at ACL 2022.

Experiments

You can follow the below 4 steps to generate grounded podcast summaries or directly download the generated summary from this link

Step 1: Download Code, Model & Data

Download the code

git clone https://github.com/tencent-ailab/GrndPodcastSum.git
cd GrndPodcastSum

Download the Trained Models to GrndPodcastSum Directory and unzip

unzip model.zip

Download the Processed Test Set (1027) to GrndPodcastSum Directory and unzip

unzip data.zip

Step 2: Setup Environment

Create the environment using .yml file.

conda env create -f env.yml
conda activate GrndPodcastSum

Step 3. Offline Computing for Chunk Embeddings

Calculating the chunk embedding offline.

sh offline.sh

Step 4. Generating Grounded Summary

Use Grnd-token-nonoveralp model to generate summary.

sh test.sh

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Disclaimer

This repo is only for research purpose. It is not an officially supported Tencent product.

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

Goal

News

Experiments

Step 1: Download Code, Model & Data

Step 2: Setup Environment

Step 3. Offline Computing for Chunk Embeddings

Step 4. Generating Grounded Summary

License

Disclaimer

Owner

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

An open collection of annotated voices in Japanese language

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Deep Learning for Natural Language Processing - Lectures 2021

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

TensorFlow code and pre-trained models for BERT

Mapping a variable-length sentence to a fixed-length vector using BERT model

End-2-end speech synthesis with recurrent neural networks

Natural Language Processing Best Practices & Examples

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

DaCy: The State of the Art Danish NLP pipeline using SpaCy

NLPShala , the best IDE for all Natural language processing tasks.

Count the frequency of letters or words in a text file and show a graph.

DAGAN - Dual Attention GANs for Semantic Image Synthesis

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

Converts python code into c++ by using OpenAI CODEX.

Speach Recognitions

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.