Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 980 Bytes

File metadata and controls

18 lines (13 loc) · 980 Bytes

Abstractive text summarisation of GitHub issues

Overview

A dataset of GitHub Issues' titles, bodies and URLs has been used to create a Sequence to Sequence model with GRUs to summarize the GitHub issue body. The machine generated title is a more compact yet accurate representation.

The project includes:

  • RNNs to create a sequence to sequence model for abstractive text summarisation.
  • Teacher forcing algorithm is used to train the decoder model.
  • A recommender that suggests GitHub issues with similar titles. The Spotify ANNOY package is used for this purpose.
  • The model's performance determined through it's BLEU score.

Architecture

Architecture diagram

Dataset

  • The dataset used has over 8M entries and hence the model requires sufficient training time.
  • You can find the dataset here.