Name		Name	Last commit message	Last commit date
parent directory ..
config		config
input		input
postprocess		postprocess
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

README.md

TGen for CS Restaurant

To train and evaluate TGen on the CS Restaurant dataset, you need to:

Convert the CS Restaurant data into a format used by TGen. This is done using the input/convert.py script. Several slots (see below) are delexicalized. The output files are:
- *-abst.txt -- lexicalization instructions (what was delexicalized at which position in the references, can be used to lexicalize the outputs)
- *-das.txt -- delexicalized DAs
- *-das_l.txt -- original, lexicalized DAs (converted to TGen's representation, semantically equivalent)
- *-text.conll -- delexicalized reference texts -- CoNLL-U format (morphology level only)
- *-text_l.conll -- original, lexicalized reference texts -- CoNLL-U format (morphology level only)
- *-text.txt -- delexicalized reference texts -- plain text
- *-text_l.txt -- original, lexicalized reference texts -- plain text
- *-tls.txt -- delexicalized reference texts -- interleaved forms/lemmas/tags
- *-tls_l.txt -- original, lexicalized reference texts -- interleaved forms/lemmas/tags
You need MorphoDiTa installed, and a Czech tagger model saved in the current directory (czech-morfflex-pdt-160310.tagger).

./convert.py -a name,area,address,phone,good_for_meal,near,food,price_range,count,price,postcode \
    czech-morfflex-pdt-160310.tagger surface_forms.json train.json train
./convert.py -a name,area,address,phone,good_for_meal,near,food,price_range,count,price,postcode \
    czech-morfflex-pdt-160310.tagger surface_forms.json devel.json devel
./convert.py -a name,area,address,phone,good_for_meal,near,food,price_range,count,price,postcode \
    czech-morfflex-pdt-160310.tagger surface_forms.json test.json test

Train TGen on the training set. This uses the default configuration file, the converted data, and the default random seed. It will save the model into model.pickle.gz (and several other files starting with model). If you want to use the development set for validation, add -v input/devel-das.txt,input/devel-text.conll as a parameter.

../run_tgen.py seq2seq_train config/config.yaml \
    input/train-das.txt input/train-text.conll \
    model.pickle.gz

Generate outputs on the development set. This will also perform lexicalization of the outputs. Change devel for test if you want to generate outputs on the test set.

../run_tgen.py seq2seq_gen -w outputs.txt -a input/devel-abst.txt \
    model.pickle.gz input/devel-das.txt

Remarks

Please refer to ../USAGE.md for TGen installation instructions.

The full configuration used Treex for data storage, tree-based generation, and output postprocessing. Getting Treex to install is a little tricky. Please contact me if you want to use it.

The Makefile in this directory contains a simple experiment management system, but this assumes running on a SGE computing cluster and there are site-specific settings hardcoded. Please contact me if you want to use it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cs-restaurant

cs-restaurant

README.md

TGen for CS Restaurant

Remarks

Files

cs-restaurant

Directory actions

More options

Directory actions

More options

Latest commit

History

cs-restaurant

Folders and files

parent directory

README.md

TGen for CS Restaurant

Remarks