Abstractive text summarisation of GitHub issues

Overview

A dataset of GitHub Issues' titles, bodies and URLs has been used to create a Sequence to Sequence model with GRUs to summarize the GitHub issue body. The machine generated title is a more compact yet accurate representation.

The project includes:

RNNs to create a sequence to sequence model for abstractive text summarisation.
Teacher forcing algorithm is used to train the decoder model.
A recommender that suggests GitHub issues with similar titles. The Spotify ANNOY package is used for this purpose.
The model's performance determined through it's BLEU score.

Architecture

Dataset

The dataset used has over 8M entries and hence the model requires sufficient training time.
You can find the dataset here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Abstractive text summarisation of GitHub issues

Overview

The project includes:

Architecture

Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Abstractive text summarisation of GitHub issues

Overview

The project includes:

Architecture

Dataset