Cohort Operations¶
A Cohort is a managed group of subjects with associated imaging data. This section covers the four main cohort operations.
Pipeline Stages¶
The pipeline runs in a fixed order (anonymization is optional):
- Anonymization - Remove PHI from DICOM headers (optional)
- Extraction - Parse DICOM metadata into the staging catalog
- Sorting - Label, group, and QC imaging sequences
- BIDS Export - Organize outputs into BIDS or flat layouts
What is a Cohort?¶
A cohort represents a dataset you want to process. It contains:
| Field | Description |
|---|---|
name |
Unique identifier for the cohort |
source_path |
Root directory containing raw DICOM files |
description |
Human-readable description |
owner |
Creator/owner of the cohort |
tags |
Categorization tags (JSON array) |
anonymization_enabled |
Whether to run anonymization stage |
Cohort Status¶
Cohorts track their processing status:
- idle - Not currently processing
- in_progress - Pipeline is running
- completed - All processing finished
Cohort Metrics¶
Progress metrics are tracked:
total_subjects- Number of subjects in cohorttotal_sessions- Number of imaging sessionstotal_series- Number of seriescompletion_percentage- Overall progress (0-100)
The Cohort Pipeline¶
flowchart TB
subgraph anonymize["1. ANONYMIZATION (optional)"]
a1["Remove PHI from DICOM headers"]
a2["Remap patient IDs via folder structure"]
a3["Generate encrypted audit logs"]
end
subgraph extract["2. EXTRACTION"]
e1["Scan DICOM files recursively"]
e2["Parse metadata into staging catalog"]
e3["Create Subject, Study, Series, Instance records"]
end
subgraph sort["3. SORTING (4 sub-steps)"]
s1["Checkup → Stack Fingerprint → Classification → Completion"]
s2["Output: StackFingerprint, SeriesClassificationCache"]
end
subgraph bids["4. BIDS EXPORT"]
b1["Filter by intent and provenance"]
b2["Convert to DICOM or NIfTI"]
b3["Generate BIDS or flat structure"]
end
anonymize --> extract
extract --> sort
sort --> bids
Stage Details¶
1. Anonymization (Optional)¶
Runs first when enabled. Removes Protected Health Information (PHI) from DICOM headers before any metadata is extracted.
| Configuration | Description |
|---|---|
patient_id.strategy |
How to derive new patient IDs (folder, hash, sequential) |
study_dates.enabled |
Whether to shift study dates |
audit_export.format |
Audit log format (encrypted_excel, csv) |
anonymize_categories |
DICOM tag categories to scrub |
2. Extraction¶
Scans the source directory and parses DICOM metadata into the database.
| Configuration | Description |
|---|---|
process_pool_workers |
Parallel processes for parsing |
scan_thread_workers |
Threads for file scanning |
folder_thread_workers |
Threads for folder traversal |
Output tables: Subject, Study, Series, Instance, SeriesStack
3. Sorting¶
A multi-step stage that classifies all series using the detection infrastructure.
| Step | Description |
|---|---|
| Checkup | Verify cohort scope and data integrity |
| Stack Fingerprint | Build classification features for each stack |
| Classification | Apply detection rules (10-stage pipeline) |
| Completion | Fill gaps and flag for manual review |
| Configuration | Description |
|---|---|
profile |
Detection profile (standard, research) |
selectedModalities |
Modalities to process (MR, CT, PT) |
skipClassified |
Skip already-classified stacks |
forceReprocess |
Force re-classification of all stacks |
4. BIDS Export¶
Generates the final output in BIDS or flat directory structure.
| Configuration | Description |
|---|---|
outputModes |
Output formats (dcm, nii, nii.gz) |
layout |
Directory structure (bids, flat) |
includeIntents |
BIDS intents to include (anat, dwi, func, fmap, perf) |
includeProvenance |
Provenance types to include (SyMRI, SWIRecon, EPIMix) |
groupSyMRI |
Group SyMRI outputs in subdirectory |
Creating a Cohort¶
- Navigate to Cohorts in the sidebar
- Click New Cohort
- Enter:
- Name: Unique identifier (e.g., "MS_Study_2024")
- Source Path: Path to DICOM data (e.g.,
/data/raw/ms_study) - Description: Study description
- Anonymization: Enable if processing PHI data
- Click Create
The cohort is now ready for processing.
Operation Order¶
Stages must run in sequence:
- Anonymization (if enabled) must complete before Extraction
- Extraction must complete before Sorting
- Sorting must complete before BIDS Export
Each stage depends on the output of the previous stage.
Job Management¶
All cohort operations run as Jobs. Jobs provide:
- Progress tracking - Percentage complete, items processed
- Status monitoring - Pending, running, completed, failed
- Error reporting - Detailed error messages
- Cancellation - Stop a running job
View jobs in the Jobs tab of the web interface.
Next Steps¶
- Anonymization - Remove PHI from DICOM data
- Extraction - Import DICOM metadata
- Sorting - Classify series
- BIDS Export - Generate output