This repository holds the LLM agents for solving CTF challenges developed for the NYU CTF Bench. The agents are:
- D-CIPHER: The multi-agent framework involving planner, executor, and auto-prompter agent with enhanced interactions.
- NYU CTF Baseline: The baseline agent presented along with the NYU CTF Bench
The LLM agents operate in a docker environment and interact with CTF challenges to solve them.
The setup requires docker to be installed on the system, please follow instructions for your OS. The code is tested with atleast python 3.10, earlier versions may work but it is not tested. It is recommended to create a python virtualenv or conda environment for this setup.
Follow these instructions to setup D-CIPHER or the baseline or both:
- Clone this repository:
git clone https://github.com/NYU-LLM-CTF/llm_ctf_automation
cd llm_ctf_automation
- Run the setup script (will take a few minutes):
./setup_dcipher.sh
or./setup_baseline.sh
- The setup script will build the corresponding docker image, setup the docker network, and install the python dependencies
- You should re-run this setup if the Dockerfile or dependencies are updated
- Download the NYU CTF dataset (will take a few minutes):
python3 -m nyuctf.download
The main D-CIPHER multi-agent system runs the planner, executor and (optionally) auto-prompter agents. Use the following command to run it:
python3 run_dcipher.py --split <test|development> --challenge <challenge-name> [--enable-autoprompt]
To run the ablation experiment of single executor (i.e. without planner), use the following command:
python3 run_single_executor.py --split <test|development> --challenge <challenge-name> [--enable-autoprompt]
Use the following command to run the baseline agent:
python3 run_baseline.py -c configs/baseline/base_config.yaml --split <test|development> --challenge <challenge-name>
While the baseline agent code is present in the main branch, you can access the baseline's last updated version at v20250206. This is the code used for the NYU CTF Bench paper.