Processing pipeline for preparing data for a PPseq run and executing a PPseq run for the sequences project.
- For more details, see our publication: Replay of Procedural Experience is Independent of the Hippocampus
Author: Emmett J. Thompson
This repository aims to:
- Process data and determine an appropriate timing range for running PPseq.
- Generate the input files required for a PPseq run.
- Execute a PPseq run in either "awake" or "replay" mode.
(Note: To run PPseq in replay mode, the data must first be processed in awake mode to train the model.)
Prerequisite:
This pipeline assumes data has been preprocessed using the following repository:
https://github.com/EmmettJT/sequences_neuropixel_preprocess
The scripts expect specific files and a predefined directory structure.
Organized data should follow this structure, containing folders for animals and recordings:
organised_data
└── animals
├── [animal_1]
│ ├── [recording_1]: behav_sync, ephys, video
│ ├── [recording_2]: behav_sync, ephys, video
│ └── ...
├── [animal_2]
│ ├── [recording_1]: behav_sync, ephys, video
│ ├── [recording_2]: behav_sync, ephys, video
│ └── ...
└── ...
-
behav_sync
:
Contains folders for each experimental session with data aligning ephys, video, and behavioral timestamps. -
ephys
:- Preprocessed data for each probe, including Kilosort output.
- Contains "good" and "MUA" spike cluster files, which include unit ID, spike times, unit depth, and region (e.g., striatum, motor cortex).
-
video
:- Raw video files with uncycled timestamp/trigger time dataframes.
- Tracking data for each video and tracking type.
Note:
While paths in the notebooks can be modified, the scripts assume a specific data format. Refer to the preprocessing repo for details on the required file contents.
This step calculates task performance for the given animal based on the behavior synchronization file.
- Performance score: Saved as a
.csv
file and plotted as a.png
in a newpost_process_ppseq
folder. - Time_intervals.txt: Used to define the ephys timeframe for PPseq.
Identify an ephys timeframe when the animal performed the task well/consistently. It’s recommended to finalize this decision after running Step 3 once for better context.
This step uses sleep tracking and synchronization files to identify sleep periods.
- Average movement velocity: Saved as
.csv
files in thepost_process_ppseq
folder.
Determine sleep time periods in Time_intervals.txt based on low movement velocity. It’s recommended to finalize this decision after running Step 3 once for better context.
This step prepares data for PPseq runs. Run this step twice:
- Initial run to generate plots for chosing a time interval.
- Use the plots to refine the time range and run again.
**Tip: For awake behaviour aim chose a time period (around 500-1200s) with lots of trials/repeates of the target behaviour.
Time_span
: Defines the timeframe (e.g., "Awake" or "PostSleep"). Multiple timeframes can be defined as lists of intervals (e.g.,[[time1, time2], [time3, time4]]
).region
: Specifies the brain region of interest based on spike data.- Other parameters:
min_fano_factor
/max_fano_factor
: Filter neurons based on the variance-to-mean spike ratio.max_firing_rate
: Excludes neurons with excessively high firing rates.single_or_multiunits
: Options are "good," "MUA," or "both" (recommended).shuffle
: Shuffles neuron IDs (for testing PPseq).visualise
: IfTrue
, generates diagnostic plots (recommended but slower).
- Prepared data: Parameters JSON and spikes file saved in a "prepared data" folder.
- Plots: Visualizations of time ranges, firing rate, movement velocity, trial occurrence, and task performance.
Refine the time range and produce input files for PPseq.
Once the input files are ready, you can run PPseq. This process is computationally intensive and should ideally be executed on an HPC system.
-
Create a new conda environment.
-
Activate the environment and clone the PPseq repository:
git clone https://github.com/EmmettJT/sequences_PPseq/tree/emmett
Note: This is a forked version of a private repository, so you will need to request access.
-
Clone the submodule:
git submodule update --init --recursive
-
Switch submodule to branch "sacredSeqBranch"
cd PPSeq.jl git checkout sacredSeqBranch
Awake timeframes (during behavior) should be processed using the Julia script PPSeq_awake_emmett.jl
.
-
Edit the Julia script:
OpenPPSeq_awake_emmett.jl
in a text editor and update thelist_of_animals
variable. This should include all recordings (in the formatanimalID_implant_recording
) for which you have prepared data. -
Run the script directly:
julia PPSeq_awake_emmett.jl --data-directory <PATH_TO_PREPARED_DATA> \ --num-threads <NUM_THREADS> \ --results-directory <PATH_TO_OUTPUT> \ --slurm-array-task-id <INDEX_FOR_LIST_OF_ANIMALS>
- Replace
<PATH_TO_PREPARED_DATA>
,<NUM_THREADS>
,<PATH_TO_OUTPUT>
, and<INDEX_FOR_LIST_OF_ANIMALS>
with the appropriate values.
- Replace
-
Or use the provided SLURM batch file:
- Update the paths and modify the
#SBATCH
settings inbatch_awake_emmett
(e.g., adjust--array=0-[NUMBER_OF_RECORDINGS_TO_RUN]
). - Execute the SLURM file:
sbatch batch_awake_emmett
- Use the command
squeue
to monitor job progress.
- Update the paths and modify the
Sleep timeframes should be processed using the Julia script PPSeq_sleep_emmett.jl
.
-
Edit the Julia script:
OpenPPSeq_sleep_emmett.jl
in a text editor and update thelist_of_animals
variable. This should include all recordings (in the formatanimalID_implant_recording
) for which you have prepared data. -
Run the script directly:
julia PPSeq_sleep_emmett.jl --data-directory <PATH_TO_PREPARED_DATA> \ --num-threads <NUM_THREADS> \ --results-directory <PATH_TO_OUTPUT> \ --number-of-sequence-types <NUM_SEQUENCE_TYPES> \ --sacred-directory <PATH_TO_AWAKE_PPSEQ_OUTPUT> \ --slurm-array-task-id <INDEX_FOR_LIST_OF_ANIMALS>
- Replace
<PATH_TO_PREPARED_DATA>
,<NUM_THREADS>
,<PATH_TO_OUTPUT>
,<NUM_SEQUENCE_TYPES>
,<PATH_TO_AWAKE_PPSEQ_OUTPUT>
, and<INDEX_FOR_LIST_OF_ANIMALS>
with the appropriate values. - Key flags:
--sacred-directory
: Points PPseq to the Awake output folder to use parameters determined during Awake training for sequence search in sleep.--number-of-sequence-types
: Specifies how many sequences to fit. The default is 6. For sleep runs, it's recommended to use the number from Awake runs plus 2 (to account for non-task-related activity).
- Replace
-
Or use the provided SLURM batch file:
- Update the paths, flags, and modify the
#SBATCH
settings inbatch_sleep_emmett
(e.g., adjust--array=0-[NUMBER_OF_RECORDINGS_TO_RUN]
). - Execute the SLURM file:
sbatch batch_sleep_emmett
- Use the command
squeue
to monitor job progress.
- Update the paths, flags, and modify the
For more informaiton on running PPseq on the clsuter and changing PPseq paramters please refer to the following:
- README of the PPseq repository
- The Methods section of our publication Replay of Procedural Experience is Independent of the Hippocampus
- The origional ppseq publication
Note:
PPseq is computationally intensive and may take several hours or even days to complete, depending on the dataset size.