Evaluation code for the paper "Cost-Effective Empirical Performance Modeling." This repository contains the code that was used to run the synthetic analysis and the case study analysis to study Extra-P's accuracy, predictive power, budget usage/modeling cost, and number of used measurement points when using different measurement point selection strategies. The evaluated strategies are: Cheapest Point First (CPF), measurement point prediction via Gaussian Process Regression (GPR), and Hybrid (a combination of the CPF+GPR strategy). The code in this repository can be used to reproduce the results and plots.
For a quick setup of the evaluation environment, we provide a Dockerfile that can be used to build an image that has all dependencies installed to run the evaluation scripts. The Dockerfile also downloads the required performance measurement dataset automatically.
Steps:
- Build a docker image from the provided Dockerfile:
docker build -t extrap-gpr .
- Run the image in a container:
docker run -it extrap-gpr /bin/bash
- Run the analysis scripts inside the Docker container.
NOTE: Building the image from the Dockerfile might take several minutes to half an hour as many dependencies have to be installed, including tex to generate the plots, the datasets, Extra-P, pycubexr, ...
NOTE: Ideally, you do the manual setup and run the analysis scripts on an HPC system in parallel. The analysis, especially for the case studies and synthetic evaluation, runs with four parameters and many evaluation functions that are very slow when run in serial and might take days to finish one config.
First, run the analysis for each case study. The following steps are an example for RELeARN; follow the same steps for all other case studies.
cd relearn
./docker_run_analysis.sh
Second, in the root folder, create the plots for the case studies and run the noise analysis.
python paper_plot_case_studies_gpr_only.py
./analyze_noise.sh
First, run the analysis for each combination of the number of model parameters and noise level. The following steps are an example for 2 parameters and 1% noise; follow the same steps for all other experiments.
cd synthetic_evaluation/2_parameter/1_noise/
./docker_run_analysis.sh
Second, in the folder synthetic_evaluation/
run the scripts to plot the results of the synthetic data experiments.
python paper_plot_gpr_only.py
python paper_plot_buckets_gpr_only.py
python paper_plot_costs_gpr_only.py
python paper_plot_points_gpr_only.py
To run and reproduce the results for each benchmark, one first needs to obtain the performance measurements that were conducted by us for the evaluation. The performance measurement datasets can be found and downloaded at: Datasets. There is one .tar.gz file for each benchmark. Download and then unpack them. You can do this by following the below steps.
Steps:
- wget https://zenodo.org/records/10085298/files/fastest.tar.gz
- wget https://zenodo.org/records/10085298/files/kripke.tar.gz
- wget https://zenodo.org/records/10085298/files/lulesh.tar.gz
- wget https://zenodo.org/records/10085298/files/minife.tar.gz
- wget https://zenodo.org/records/10085298/files/quicksilver.tar.gz
- wget https://zenodo.org/records/10085298/files/relearn.tar.gz
- tar -xzf fastest.tar.gz
- tar -xzf kripke.tar.gz
- tar -xzf lulesh.tar.gz
- tar -xzf minife.tar.gz
- tar -xzf quicksilver.tar.gz
- tar -xzf relearn.tar.gz
The code in this repository uses a specific version of Extra-P for the analysis. If not using the provided Dockerfile you have to build this version of Extra-P from source. This specific version of Extra-P can be found at Extra-P. You can do this by following the below steps.
Steps:
- download the
.zip
file at Extra-P - unzip the downloaded file
unzip extrap.zip -d extrap
pip install -e extrap/extrap-vNext/
After installation add extrap in user installation to path if necessary: export PATH="$HOME/.local/bin:$PATH"
.
If you do not use the Dockerfile for the quick setup you have to install other python packages that are used by the analysis codes in this repo on the fly, e.g., scipy, pandas, if you do not have them installed already.
The code in this repository uses a specific version of pyCubexR for the analysis. This software is used by Extra-P to read in files in the Cube4 format. If not using the provided Dockerfile you have to build this version of pyCubexR from source. The specific version of pyCubexR is available at pyCubexR. You can do this by following the below steps.
Steps:
- download the
.zip
file at pyCubexR - unzip the downloaded file
unzip pycubexr-master.zip -d pycubexr
pip install -e pycubexr/pycubexr-master/
General dependencies that should be installed:
- python3
- wget
- git
- unzip
- texlive
- latex
Install the specific version of Extra-P used for this analysis:
wget https://zenodo.org/records/10086772/files/extrap.zip
unzip extrap.zip -d extrap
pip install -e extrap/extrap-vNext/
Install the specific verison of pyCubexR used for this analysis:
wget https://zenodo.org/records/10092353/files/pycubexr-master.zip
unzip pycubexr-master.zip -d pycubexr
pip install -e pycubexr/pycubexr-master/
Install the pip dependencies: sympy, scikit-learn, natsort, pandas.
To recreate the evaluation and case study results using the manual setup follow the instructions in the following sections.
Using the following commands and provided scripts one can reproduce the results shown in the paper. For all case studies, besides RELEARN, the path to the data needs to be changed, depending on to which directory you unpacked/downloaded the datasets.
The ./run_analysis.sh
and ./process.sh
scripts are build for usage on a Cluster or HPC system. The parallelize the analysis process using jobs to speed it up.
cd relearn
- (single run)
python ../case_study.py --text relearn_data.txt --processes 0 --parameters "p","n" --eval_point "512","9000" --filter 1 --budget 100 --plot True --normalization True --grid-search 3 --base-values 1 --hybrid-switch 20 --repetition 2
./run_analysis.sh
./process.sh
./archive.sh filtered
python single_plot.py --path filtered/analysis_results/ --name results_filtered --reps 2 --min 9 --filter 1
python budget_usage_plot.py --path filtered/analysis_results/ --name budget_usage --reps 2 --min 9
Use --min 9
for filtered run with >1%
runtime kernels. Use --min 13
for run with all available kernels.
cd lulesh/lichtenberg
- (single run)
python ../../case_study.py --cube /work/scratch/mr52jiti/data/lulesh/ --processes 0 --parameters "p","s" --eval_point "1000","35" --filter 1 --budget 100 --plot True --normalization True --grid-search 3 --base-values 2 --hybrid-switch 20 --repetition 5
./run_analysis.sh
./process.sh
./archive.sh filtered
python single_plot.py --path filtered/analysis_results/ --name results_filtered --reps 5
python budget_usage_plot.py --path filtered/analysis_results/ --name budget_usage_filtered --reps 5
cd minife/lichtenberg
- (single run)
python ../../case_study.py --cube /work/scratch/mr52jiti/data/minife/ --processes 0 --parameters "p","n" --eval_point "2048","350" --filter 1 --budget 100 --plot True --normalization True --grid-search 3 --base-values 2 --hybrid-switch 20 --repetition 5
./run_analysis.sh
./process.sh
./archive.sh filtered
python single_plot.py --path filtered/analysis_results/ --name results_filtered --reps 5
python budget_usage_plot.py --path filtered/analysis_results/ --name budget_usage_filtered --reps 5
cd fastest
- (single run)
python ../case_study.py --cube /work/scratch/mr52jiti/data/fastest/ --processes 0 --parameters "p","size" --eval_point "512","65536" --filter 1 --budget 100 --plot True --normalization True --grid-search 3 --base-values 2 --hybrid-switch 20 --repetition 5
./run_analysis.sh
./process.sh
./archive.sh filtered
python single_plot.py --path filtered/analysis_results/ --name results_filtered --reps 5
python budget_usage_plot.py --path filtered/analysis_results/ --name budget_usage_filtered --reps 5
cd kripke
- (single run)
python ../case_study.py --cube /work/scratch/mr52jiti/data/kripke/ --processes 0 --parameters "p","d","g" --eval_point "32768","12","160" --filter 1 --budget 100 --plot True --normalization True --grid-search 3 --base-values 2 --hybrid-switch 20 --repetition 5
./run_analysis.sh
./process.sh
./archive.sh filtered
python single_plot.py --path filtered/analysis_results/ --name results_filtered --reps 5
python budget_usage_plot.py --path filtered/analysis_results/ --name budget_usage_filtered --reps 5
cd quicksilver
- (single run)
python ../../case_study.py --cube /work/scratch/mr52jiti/data/quicksilver/ --processes 0 --parameters "p","m","n" --eval_point "512","20","60" --filter 1 --budget 100 --plot True --normalization True --grid-search 3 --base-values 2 --hybrid-switch 20 --repetition 5
./run_analysis.sh
./process.sh
./archive.sh filtered
python single_plot.py --path filtered/analysis_results/ --name results_filtered --reps 5
python budget_usage_plot.py --path filtered/analysis_results/ --name budget_usage_filtered --reps 5
To check the number of measurements ls /work/scratch/mr52jiti/data/quicksilver/ | wc -l
.
Some of the measurements did not run successfully. See with ls *.er /work/scratch/mr52jiti/data/quicksilver/quicksilver.p*/profile.cubex | wc -l
the ones that actually have a profile.cubex.
After completing the analysis for all of the above case studies you can follow the below steps to recreate the plots containing the results for all case studies.
Steps:
python paper_plot_case_studies_gpr_only.py
To reproduce the noise analysis plot for the case studies use the analysis script provided.
./analyze_noise.sh
It will run the noise analysis for each case study and then plot all of the data into a single plot in put everything into a folder noise_analysis/
.
For individual runs use:
- RELEARN:
python noise_analysis.py --text relearn/relearn_data.txt --total-runtime 31978.682999999997 --name relearn
- LULESH:
python noise_analysis.py --cube /work/scratch/mr52jiti/data/lulesh/ --name lulesh
- FASTEST:
python noise_analysis.py --cube /work/scratch/mr52jiti/data/fastest/ --name fastest
- KRIPKE:
python noise_analysis.py --cube /work/scratch/mr52jiti/data/kripke/ --name kripke
- MiniFE:
python noise_analysis.py --cube /work/scratch/mr52jiti/data/minife/ --name minife
- Quicksilver:
python noise_analysis.py --cube /work/scratch/mr52jiti/data/quicksilver/ --name quicksilver
Navigate to the folder, the analysis you want to reproduce, e.g., cd 2_parameters/1_noise/
. Then follow the below steps.
Steps:
./run_analysis.sh
to run the analysis in parallel on a cluster../process.sh
to run the postprocessing after all jobs have finished../archive.sh <folder_name>
archive all of the result data into the given foldername.python single_plot.py --path <folder_name>/analysis_results/ --name results_<folder_name> --reps 4
create a result plot using the data found in the archived experiment folder.python budget_usage_plot.py --path final/analysis_results/ --name budget_usage --reps 4
to run the budget analysis. Shows you how efficiently the different point selection strategies utilize the available budgets.
Basic usage for single runs: python synthetic_evaluation.py --nr-parameters 2 --nr-functions 100 --nr-repetitions 4 --noise 1 --mode budget --budget 10 --plot True --normalization True --grid-search 3 --base-values 2
Use --nr-parameters <nr_params>
to set the number of model parameters considered for the evaluation.
Use --noise <noise_percent>
to set the artificially induced noise (+-) into the measurements in %.
Use --nr-functions <nr_functions>
the number of synthetically generated functions to run the evalutation for.
Leave all other parameters as is. There values have been carefully selected doing a grid search for the best configurations for all supported number of model parameters and noise levels.
- Measure the cheapest available lines of five points per parameter. Use these points to create a first model using the sparse modeling technique. 1b. Take at least one additional point not on this axis per dimension, model parameter. base_points + (nr_dimensions - 1)
- Perform an additional measurement, starting from the cheapest ones available. Using the model previously determined, one can assess if the quality of the model is sufficient (by comparing the accuracy metrics of the two models on the points used for modeling) or if additional points are required. Or until no more modeling budget is available.
- Recreate the model using all available points.
- If the quality of the model evaluated in step 2 is insufficient, return to step 2. (or if there is more budget available for modeling...)
This strategy always uses 4 repetitions per measurement point. In the previous paper we found that our results showed that 4 repetitions are optimal.
- Measure the cheapest available lines of five points per parameter. Use these points to create a first model using the sparse modeling technique. 1b. Take at least one additional point not on this axis per dimension, model parameter. base_points + (nr_dimensions - 1)
- Use these points as input to the GPR. 2b. The noise level on these points is estimated in % divergence from the arithmetic mean and then used as input value for a WhiteKernel() that is added to the GPR Matern CovVariance() function.
- Train the GPR.
- Suggest a new point using the trained GPR.
- Take this measurement and add it to the experiment. Create a new model.
- Add the new point to the GPR. Train the GPR again.
- Continue this process until budget for modeling is exhausted. Steps: 4-6.
The GPR strategy can if specified start using less repetitions for the basic lines of points (the minimal measurement point set), e.g., only 2 instead of 4, compared to the CPF strategy. Enabling this feature gives the best results, as it provides the most freedom to the GPR strategy to choose from a larger set of points, gives it more flexibility. Furthermore the GPR strategy, considers the number of repetitions of a measurement point. This means it always selects 1 repetitions at a time, compared to the 4 of the CPF strategy. Therefore, it can reason whether it is better to measure a new measurement point or repeat the measurement of a already measured point. Not always are many repetitions of a point required, they can even reduce model accuracy. Furthermore, giving the GPR strategy this freedom enables a more optimal usage of the allowed modeling budget.
To reason about the trade-off between new points and repetitions the GPR strategy internally uses a heuristic formula / mathematical weight function to calculate what is more appropriate depending on factors such as the noise level on the measurements, the number of already available repetitions for a specific measurement point, and the cost of the potential additional modeling points.
The weight function looks as follows:
Where
Lower and upper bound for
The weight function is optimized so that at low noise levels, taking new points is cheaper, as more repetitions of the same measurement point are less useful. For high noise levels, repetitions are favored over new points if they are cheaper to counter the effects of noise.
we use a Matern
The distance is
In addition, we use a white kernel to explain the noise of the signal as independently and identically normally distributed. The parameter noise_level equals the variance of this noise.
The white kernel:
if
Our GPR kernel then looks like this:
kernel = 1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-5, 1e5), nu=1.5) + WhiteKernel(noise_level=mean_noise)
- Measure the cheapest available lines of five points per parameter. Use these points to create a first model using the sparse modeling technique. 1b. Take at least one additional point not on this axis per dimension, model parameter. base_points + (nr_dimensions - 1)
- uses generic strategy until a swtiching_point is hit, e.g. 13 selected points (base points + additional points) for 2 parameters.
- then uses gpr strategy to select points
- Use these points as input to the GPR. 4b. The noise level on these points is estimated in % divergence from the arithmetic mean and then used as input value for a WhiteKernel() that is added to the GPR Matern CovVariance() function.
- continues selecting points with 2.-4. until the given budget is exhausted
For the hybrid strategy the same applies as for the GPR strategy regarding the repetitions of measurement points and their selection.
BSD 3-Clause "New" or "Revised" License
Please cite Extra-P in your publications as follows if it helps your research:
@inproceedings{calotoiu_ea:2013:modeling,
author = {Calotoiu, Alexandru and Hoefler, Torsten and Poke, Marius and Wolf, Felix},
month = {November},
title = {Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes},
booktitle = {Proc. of the ACM/IEEE Conference on Supercomputing (SC13), Denver, CO, USA},
year = {2013},
pages = {1--12},
publisher = {ACM},
isbn = {978-1-4503-2378-9},
doi = {10.1145/2503210.2503277}
}
Please cite the CPF method and results in your publication as follows:
@inproceedings{ritter_ea:2020:ipdps,
title={Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling},
author={Ritter, Marcus and Calotoiu, Alexandru and Rinke, Sebastian and Reimann, Thorsten and Hoefler, Torsten and Wolf, Felix},
booktitle={Proceedings of the 34th International Parallel
and Distributed Processing Symposium (IPDPS)},
pages={884--895},
year={2020},
organization={IEEE}
}
Please cite the performance measurement dataset used for this work in your publications if it helps your research using:
@dataset{ritter_2023_10085298,
author = {Ritter, Marcus and
Calotoiu, Alexandru and
Rinke, Sebastian and
Reimann, Thorsten and
Hoefler, Torsten and
Wolf, Felix},
title = {{Performance Measurement Dataset of the HPC
Benchmarks FASTEST, Kripke, LULESH, MiniFE,
Quicksilver, and RELeARN for Scalability Studies
with Extra-P}},
month = nov,
year = 2023,
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.10085298},
url = {https://doi.org/10.5281/zenodo.10085298}
}
Please cite the version of Extra-P used for the evaluation of this work in your publications if it helps your research using:
@software{ritter_2023_10086772,
author = {Ritter, Marcus and
Geiß, Alexander and
Calotoiu, Alexandru and
Wolf, Felix},
title = {{Extra-P: Automated performance modeling for HPC
applications}},
month = nov,
year = 2023,
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.10086772},
url = {https://doi.org/10.5281/zenodo.10086772}
}
Please cite the version of pyCubexR used for the evaluation of this work in your publications if it helps your research using:
@software{ritter_2023_10092353,
author = {Ritter, Marcus and
Geiß, Alexander},
title = {{pyCubexR: a Python package for reading the Cube4
(.cubex) file format}},
month = nov,
year = 2023,
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.10092353},
url = {https://doi.org/10.5281/zenodo.10092353}
}