A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Last update: Dec 02, 2022

Overview

torch-cif

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.

Usage

def cif_function(
    input: Tensor,
    alpha: Tensor,
    beta: float = 1.0,
    padding_mask: Optional[Tensor] = None,
    target_lengths: Optional[Tensor] = None,
    max_output_length: Optional[int] = None,
    eps: float = 1e-4,
) -> Tuple[Tensor, Tensor, Tensor]:
    r""" A batched computation implementation of continuous integrate and fire (CIF)
    https://arxiv.org/abs/1905.11235

    Args:
        input (Tensor): (N, S, C) Input features to be integrated.
        alpha (Tensor): (N, S) Weights corresponding to each elements in the
            input. It is expected to be after sigmoid function.
        beta (float): the threshold used for determine firing.
        padding_mask (Tensor, optional): (N, S) A binary mask representing
            padded elements in the input.
        target_lengths (Tensor, optional): (N,) Desired length of the targets
            for each sample in the minibatch.
        max_output_length (int, optional): The maximum valid output length used
            in inference. The alpha is scaled down if the sum exceeds this value.
        eps (float, optional): Epsilon to prevent underflow for divisions.
            Default: 1e-4

    Returns: Tuple (output, feat_lengths, alpha_sum)
        output (Tensor): (N, T, C) The output integrated from the source.
        feat_lengths (Tensor): (N,) The output length for each element in batch.
        alpha_sum (Tensor): (N,) The sum of alpha for each element in batch.
            Can be used to compute the quantity loss.
    """

Note

ℹ️ This is a WIP project. the implementation is still being tested.

This implementation uses cumsum and floor to determine the firing positions, and use scatter to merge the weighted source features.
Run test by python test.py (requires pip install expecttest).
Feel free to contact me if there are bugs in the code.

Reference

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Related tags

Overview

torch-cif

Usage

Note

Reference

Owner

張致強

A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!

Robust and Accurate Object Detection via Self-Knowledge Distillation

Episodic-memory - Ego4D Episodic Memory Benchmark

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Pytorch implementation of the paper SPICE: Semantic Pseudo-labeling for Image Clustering

Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

v objective diffusion inference code for JAX.

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

Summary of related papers on visual attention

Repository for MDPGT

Advances in Neural Information Processing Systems (NeurIPS), 2020.

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Exploring the Dual-task Correlation for Pose Guided Person Image Generation

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Python package to add text to images, textures and different backgrounds

OMAMO: orthology-based model organism selection

Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.