The dataset and code are for research purposes only. The results of this study are published in the following paper:
Jimmy Dani and Boyang Wang, "HiddenText: Cross-Trace Website Fingerprinting Over Encrypted Traffic," IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI'21), August, 2021.
The src
directory comprises sub-directories, each of which is labeled as a performance evaluation part of the paper depending on experiment name. At the beginning of each script the instructions to execute the script are included.
- For experiments A1 and A2, execute following command
python <name-of-script.py> /path/to/save/model/model-name.h5 /path/to/dataset
- For experiments A3, A4, and A5 execute the following command
python <name-of-script.py> /path/to/wt-def-model.h5 /path/to/paired/dataset
- For experiment A6, the directory contains two sub-directories model-training and nlp-analysis
- model-training: contains script for training model defended using MockingBird defense. To run this script, execute the following command
python <name-of-script.py> /path/to/save/model/model-name.h5 /path/to/dataset
- nlp-analysis: contains scripts for performing cross-trace attack on MockingBird defended dataset. To run the script, execute the following command
python <name-of-script.py> /path/to/mb-def-model.h5 /path/to/paired/dataset
- model-training: contains script for training model defended using MockingBird defense. To run this script, execute the following command
In addition, certain more libraries are needed to run scripts other than the pre-installed python packages. The following are the libraries:
- tensorflow-gpu==2.3.1
- sentence-transformers==0.4.1.2
- gensim==3.8.3
- pandas
- numpy==1.19.5
- scikit-learn==0.23.2
- nltk==3.5
The dataset used for this research is available at this link.
The used data set is provided as CSV files for this research. For each experiment six CSV files are required, each of which is described in the following manner:
- x_train: This file contains the traffic traces which are used as an input for the CNN for training
- y_train: This file contains labels corresponding to the traffic traces available in x_train
- x_valid: This file contains the traffic traces used for validation
- y_valid: This file contains labels corresponding to the traffic traces available in x_valid
- x_test: This file constains the traffic traces for testing/evaluating the trained CNN model
- y_test: This file contains labels corresponding the traffic traces in x_test
Note: The data needed for various experiments are organized in subdirectories similar to the code.
- Jimmy Dani ([email protected]), University of Cincinnati
- Boyang Wang ([email protected]), University of Cincinnati