Automatic Error Analysis for Document-level Information Extraction

Software for our ACL 2022 Main Conference Long paper (Link to be posted)

We propose a transformation-based framework for automating error analysis in document-level event and (N-ary) relation extraction.

From the output, one can further generate a error profile graph like below:

Cite

If you use our code or data/outputs, please cite:

@InProceedings{auto_error,
  author = {Aliva Das, Xinya Du, Barry Wang, Kejian Shi, Jiayuan Gu, Thomas Porter and Claire Cardie},
  title = {Automatic Error Analysis for Document-level Information Extraction},
  booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
  year = {2022},
}

INSTALLATION

Requires the use of Python 3.6 and above. Please install the following packages:

json

re

argparse

textwrap

copy

numpy

tqdm

psutil

os

spacy

Also download the en_core_web_sm spaCy model using the following command:

python -m spacy download en_core_web_sm

USAGE

Error_Analysis.py script command line arguments:

-h, --help

show this help message and exit

-i INPUT_FILE, --input_file INPUT_FILE

The path to the input file given to the system

-v, --verbose

Increase output verbosity

-at, --analyze_transformed

Analyze transformed data

-s {all,msp,mmi,mat}, --scoring_mode {all,msp,mmi,mat}

Choose scoring mode according to MUC:

1. all - All Templates

2. msp - Matched/Spurious

3. mmi - Matched/Missing

4. mat - Matched Only

-m {MUC_Errors,Errors}, --mode {MUC_Errors,Errors}

Choose evaluation mode:

1. MUC_Errors - MUC evaluation with added constraint of incident_types of templates needing to match

2. Errors - General evaluation with no added constraints

-o OUTPUT_FILE, --output_file OUTPUT_FILE

The path to the output file the system writes to

-j OUTPUT_JSON, output_json OUTPUT_JSON

The path to the output file the system writes to as JSON

EXAMPLE:

For MUC-4 data:

python3 Error_Analysis.py -i "model_preds.out" -o "err_file.out" --verbose -s all -m "MUC_Errors" -at

For other datasets

python3 Error_Analysis.py -i "model_preds.out" -o "err_file.out" --verbose -s all -m "Errors" -at

Remember to change the global variable role_names in the Error_Analysis.py script to match the roles associated with your dataset.

See the model_outputs folder for examples of the input files the Error_Analysis.py script requires to run.
See the error_outputs folder for examples of the outputs given by the Error_Analysis.py script on the input files in the model_outputs folder.
See the datasets folder for the processed versions of the datasets (only MUC-4 and SciREX) as well as the scripts used to process the data (all datasets).

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
datasets		datasets
error_outputs		error_outputs
image		image
model_outputs		model_outputs
.gitignore		.gitignore
Error_Analysis.py		Error_Analysis.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Error Analysis for Document-level Information Extraction

Cite

INSTALLATION

USAGE

EXAMPLE:

About

Releases

Packages

Contributors 4

Languages

License

IceJinx33/auto-err-template-fill

Folders and files

Latest commit

History

Repository files navigation

Automatic Error Analysis for Document-level Information Extraction

Cite

INSTALLATION

USAGE

EXAMPLE:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages