Hiercon

Requirements

We will take Ubuntu for example.

python 3.6

$ sudo apt-get install 3.6

other python packages

$ pip install -r requirements.txt

Pre-requisites

Please use install Autophrase by

$ git clone https://github.com/shangjingbo1226/AutoPhrase.git

And follows the instruction from there. Additionally, configure inside Hiercon by setting the AUTOPHRASE_PATH to be the autophrase installation path.

Training and testing

Our model works in a weakily supervised setting, where given a single text file with each row representing one specific document, along with training labels for a few rows, it predicts the label for all the documents.

Input Format

The input text files as specified by {your prefix name here}_merged_tokenized, and the prefix name is specified in ./run.sh at
The training documents is specified by a {your prefix name here}_merged_tokenized_training_inds_HANsFile.bin, by pickle.dump() your python list containing the row index of the training documents.
the training labels is specified by {your prefix name here}_merged_tokenized_superspan_HANs_labels.txt, and each row contains a label for the corresponding row in the input text files. (only the rows in the training indexes are used.)

Output Format

The output prediction is specified by {your prefix name here}_merged_tokenized_prediction_result.txt. Each row contains a label for the corresponding row in the input text files.

Hyper-Parameters

The list of parameters, their default values, and a short description is attached below

parser.add_argument("--batch_size", type=int, default=16)
parser.add_argument("--num_epoches", type=int, default=5)
parser.add_argument("--log_interval", type=int, default=5)
parser.add_argument("--lr", type=float, default=0.0001)
parser.add_argument("--momentum", type=float, default=0.9)
parser.add_argument("--word_feature_size", type=int, default=4)
parser.add_argument("--sent_feature_size", type=int, default=3)
parser.add_argument("--num_bins", type=int, default=10)
parser.add_argument("--es_min_delta", type=float, default=0.0,
                    help="Early stopping's parameter: minimum change loss to qualify as an improvement")
parser.add_argument("--es_patience", type=int, default=5,
                    help="Early stopping's parameter: number of epochs with no improvement after which training will be stopped. Set to 0 to disable this technique.")
parser.add_argument("--test_interval", type=int, default=1,
                    help="Number of epoches between testing phases")
parser.add_argument("--log_path", type=str, default="tensorboard/han_voc")

Sample data

A set of sample data containing all the intermediate results in order to run the prediction is available at Google drive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hiercon

Requirements

Pre-requisites

Training and testing

Input Format

Output Format

Hyper-Parameters

Sample data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hiercon

Requirements

Pre-requisites

Training and testing

Input Format

Output Format

Hyper-Parameters

Sample data