Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Last update: Sep 14, 2022

Related tags

Overview

Graph Neural Topic Model (GNTM)

This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Requirements

Python >= 3.6
Pytorch == 1.6.0
torch-geometric == 1.7.0
torch-scatter == 2.0.6
torch-sparse == 0.6.9

Dataset

The links of the datasets can be found in the following:

The Glove word embeddings can be download from theis link.

The datasets and word embedings should be placed with the guide of the paths in the settings.py.

Usage

Before training GNTM, we first need to preprocess the data by the following scripts (need adjust some parameters based on the description in our paper for different datasets.):

cd dataPrepare
python preprocess.py
python graph_data.py

Example script to train GNTM:

python main.py \
--device cuda:0 \
--dataset News20 \
--model GDGNNMODEL \
--num_topic 20 \
--num_epoch 400 \
--ni 300  \
--word \
--taskid 0 \
--nwindow  3

Here,

--dataset specifies the dataset name, currently it supports News20, TMN, BNC and Reuters for 20 News Group, Tag My News, British National Corpus and Reuters, respectively.
--device represents computation device, such as cpu or cuda:0.
--model represents the used model, GDGNNMODEL is corresponding to GNTM
--num_topic represents the number of topics.
--num_epoch represents the maximized number of training epochs.
--ni represents the dimension of word embeddings.
--taskid is corresponding to the random seed.
--nwindow represents the window size to construct dpcument graphs.

Reference

If you find our methods or code helpful, please kindly cite the paper:

@inproceedings{shen2021topic,
  title={Topic Modeling Revisited: A Document Graph-based Neural Network Perspective},
  author={Shen, Dazhong and Qin, Chuan and Wang, Chao and Dong, Zheng and Zhu, Hengshu and Xiong, Hui},
  booktitle={Proceedings of Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS-2021)},
  year={2021}
}

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Related tags

Overview

Graph Neural Topic Model (GNTM)

Requirements

Dataset

Usage

Reference

Owner

Dazhong Shen

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Aerial Imagery dataset for fire detection: classification and segmentation (Unmanned Aerial Vehicle (UAV))

Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection

Very Deep Convolutional Networks for Large-Scale Image Recognition

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

Cooperative Driving Dataset: a dataset for multi-agent driving scenarios

TabNet for fastai

Reference PyTorch implementation of "End-to-end optimized image compression with competition of prior distributions"

Multi-Glimpse Network With Python

SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data for Cancer Type Classification

Towards End-to-end Video-based Eye Tracking

A python script to lookup Passport Index Dataset

Danfeng Hong, Lianru Gao, Jing Yao, Bing Zhang, Antonio Plaza, Jocelyn Chanussot. Graph Convolutional Networks for Hyperspectral Image Classification, IEEE TGRS, 2021.

PyTorch implementation of CVPR 2020 paper (Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence) and pre-trained model on ImageNet dataset

some classic model used to segment the medical images like CT、X-ray and so on

Raindrop strategy for Irregular time series

Source code for deep symbolic optimization.

Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

A human-readable PyTorch implementation of "Self-attention Does Not Need O(n^2) Memory"

Official Python implementation of the 'Sparse deconvolution'-v0.3.0