Skip to content

lvapeab/interactive-keras-captioning

Repository files navigation

Interactive Keras Captioning

Compatibility Requirements Status Documentation Status license

Interactive multimedia captioning with Keras (Theano and Tensorflow). Given an input image or video, we describe its content.

Documentation: https://interactive-keras-captioning.readthedocs.io

Recurrent neural network model with attention

alt text

Transformer model

alt text

Interactive captioning

Interactive-predictive pattern recognition is a collaborative human-machine framework for obtaining high-quality predictions while minimizing the human effort spent during the process.

It consists in an iterative prediction-correction process: each time the user introduces a correction to a hypothesis, the system reacts offering an alternative, considering the user feedback.

For further reading about this framework, please refer to Interactive Neural Machine Translation, Online Learning for Effort Reduction in Interactive Neural Machine Translation and Active Learning for Interactive Neural Machine Translation of Data Streams.

Features (in addition to the full Keras cosmos): .

Installation

Assuming that you have pip installed, run:

git clone https://github.com/lvapeab/interactive-keras-captioning
cd interactive-keras-captioning
pip install -r requirements.txt

for obtaining the required packages for running this library.

Requirements

Interactive Keras Captioning requires the following libraries:

For accelerating the training and decoding on CUDA GPUs, you can optionally install:

Usage

Preprocessing

The instructions for data preprocessing (image or videos) are here.

Training

  1. Set a training configuration in the config.py script. Each parameter is commented. You can also specify the parameters when calling the main.py script following the syntax Key=Value

  2. Train!:

python main.py

Decoding

Once we have our model trained, we can translate new text using the caption.py script. In short, if we want to use evaluate the test set from a the dataset MSVD with an ensemble of two models, we should run something like:

 python caption.py 
             --models trained_models/epoch_1 \ 
                      trained_models/epoch_2 \
             --dataset datasets/Dataset_MSVD.pkl \
             --splits test

Acknowledgement

This library is strongly based on NMT-Keras. Much of the library has been developed together with Marc Bolaños (web page) for other sequence-to-sequence problems.

To see other projects following the same philosophy and style of Interactive Keras Captioning, take a look to:

NMT-Keras: Neural Machine Translation.

ABiViRNet: Video description.

TMA: Egocentric captioning based on temporally-linked sequences.

VIBIKNet: Visual question answering.

Sentence SelectioNN: Sentence classification and selection.

DeepQuest: State-of-the-art models for multi-level Quality Estimation.

Warning!

There is a known issue with the Theano backend. When running main.py with this backend, it will show the following message:

[...]
raise theano.gof.InconsistencyError("Trying to reintroduce a removed node")
InconsistencyError: Trying to reintroduce a removed node

It is not a critical error, the model keeps working and it is safe to ignore it. However, if you want the message to be gone, use the Theano flag optimizer_excluding=scanOp_pushout_output.

Contact

Álvaro Peris (web page): [email protected]