Sorting¶
Sorting runs the classification pipeline on all extracted series. It consists of four steps that must run in sequence.
The Four Steps¶
flowchart LR
A["Step 1: Checkup"] --> B["Step 2: Stack Fingerprint"]
B --> C["Step 3: Classification"]
C --> D["Step 4: Output Generation"]
Step 1: Checkup¶
Purpose: Validate data and prepare series for processing.
What It Does¶
- Cohort Subject Resolution
- Gets all subjects in the cohort
-
Validates subject membership
-
Study Discovery
- Finds all studies for these subjects
-
Validates study-subject relationships
-
Study Date Validation & Repair
- Checks for missing
study_date - Attempts recovery from
acquisition_dateorcontent_date -
Flags studies with unrecoverable dates
-
Series Collection
- Gets all series from valid studies
-
Filters by modality if configured
-
Existing Classification Filter
- Checks if series already classified
- Skip or reprocess based on configuration
Output¶
Step1Handover containing:
- List of
SeriesForProcessing(series_id, study_id, subject_id) - Validation results
- Excluded series with reasons
Step 2: Stack Fingerprint¶
Purpose: Build classification-ready feature vectors for each stack.
What It Does¶
- Load Handover
-
Receives series IDs from Step 1
-
Query Stack Data
- Fetches all SeriesStack records
-
Joins with Series, Study, and modality-specific details
-
Build Fingerprints (Polars)
- Vectorized transformations using Polars
- Normalizes values across modalities
- Aggregates text fields into searchable blobs
-
Computes geometry features (FOV, aspect ratio)
-
Database Upsert
- Bulk COPY into
stack_fingerprinttable -
UPSERT for existing fingerprints
-
Batched Commits
- Commits in batches to prevent OOM
- Enables progress tracking
Performance¶
- Processes ~450K stacks in 45-60 seconds
- Previous ORM-based approach caused OOM on large datasets
- Polars vectorization provides 10-50x speedup
Output¶
Step2Handover containing:
- List of
fingerprint_idvalues - Processing statistics
Step 3: Classification¶
Purpose: Run the 10-stage classification pipeline on each fingerprint.
What It Does¶
For each StackFingerprint:
- Stage 0: Exclusion Check
- Filters screenshots, secondary reformats
-
Checks ImageType flags
-
Stage 1: Provenance Detection
- Determines processing pipeline
-
Routes to appropriate branch
-
Stage 2: Technique Detection
-
Identifies pulse sequence family
-
Stage 3: Branch Logic
-
Executes provenance-specific logic:
SWI Branch→ SWI/QSM classificationSyMRI Branch→ Synthetic MRI classificationEPIMix Branch→ Multi-contrast EPIRawRecon Branch→ Standard detection
-
Stage 4: Modifier Detection
-
Detects FLAIR, FatSat, MT, etc.
-
Stage 5: Acceleration Detection
-
Detects GRAPPA, SMS, etc.
-
Stage 6: Contrast Agent Detection
-
Pre/post contrast determination
-
Stage 7: Body Part Detection
-
Spinal cord flagging
-
Stage 8: Intent Synthesis
-
Maps to BIDS directory_type (anat, dwi, func, fmap)
-
Stage 9: Review Flag Aggregation
- Combines all review triggers
- Sets
manual_review_required
Output¶
SeriesClassificationCache records containing:
- All six classification axes
- Flags (post_contrast, localizer, spinal_cord)
- BIDS intent (directory_type)
- Review requirements
Step 4: Output Generation¶
Purpose: Export classified data to target structure.
What It Does¶
- Filter by Classification
- Include/exclude by provenance
-
Include/exclude by intent
-
Organize Output
- BIDS structure or flat layout
-
Provenance-specific routing
-
Copy/Convert Files
- DICOM copy or NIfTI conversion
- Parallel processing
Output Modes¶
| Mode | Description |
|---|---|
dcm |
Copy DICOM files |
nii |
Convert to NIfTI |
nii.gz |
Convert to compressed NIfTI |
Running Sorting¶
From Web Interface¶
- Navigate to the cohort
- Click Sort
- Steps run automatically in sequence
- Monitor in Jobs tab
Step-by-Step Execution¶
You can also run steps individually:
- Run Step 1 (Checkup)
- Review validation results
- Run Step 2 (Fingerprint)
- Run Step 3 (Classification)
- Run Step 4 (Output) when ready
Configuration Options¶
| Option | Description | Default |
|---|---|---|
reprocess |
Reclassify already-classified series | false |
include_modalities |
Filter to specific modalities | all |
parallel_workers |
Classification workers | 4 |
Date Recovery¶
Step 1 attempts to repair missing study_date:
- Check
acquisition_datefrom Instance - Check
content_datefrom Instance - Mark as excluded if unrecoverable
This handles DICOM files with missing or corrupted dates.
Stack Key¶
Each SeriesStack has a deterministic stack_key:
This enables:
- Duplicate detection across reruns
- Idempotent classification
- Stack grouping within series
Fingerprint Features¶
StackFingerprint contains normalized features:
General Features¶
modality- MR, CT, PETmanufacturer- Normalized (GE, SIEMENS, PHILIPS, etc.)text_search_blob- Concatenated descriptions
Geometry Features¶
stack_orientation- Axial, Coronal, Sagittalfov_x,fov_y- Field of view in mmaspect_ratio- FOV ratio
MR Features¶
mr_te,mr_tr,mr_ti- Timing parameters (ms)mr_flip_angle- Flip angle (degrees)mr_acquisition_type- 2D or 3Dmr_diffusion_b_value- Diffusion b-value
CT Features¶
ct_kvp- Tube voltagect_tube_current- Tube currentct_convolution_kernel- Reconstruction kernel
PET Features¶
pet_tracer- Radiopharmaceuticalpet_reconstruction_method- Recon algorithmpet_suv_type- SUV calculation type
Troubleshooting¶
"No series to process"¶
- Check extraction completed successfully
- Verify series exist in database
- Check modality filters
Classification Issues¶
- Review
manual_review_requiredflags - Check
manual_review_reasons_csvfor details - Use QC interface to review flagged series
Performance¶
- Step 2 is typically the bottleneck
- Ensure adequate RAM (8GB+ for large datasets)
- Reduce batch size if memory issues occur