Changelog¶
All notable changes to NILS will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.3.0] - 2026-03-20¶
Added¶
- Classification Engine — Complete rewrite of MRI sequence classification as a modular, YAML-driven 10-stage pipeline
- Six orthogonal detection axes (base contrast, technique, modifier, construct, provenance, acceleration) each backed by its own YAML config and detector class
- Semantic text normalizer that tokenizes DICOM descriptions — handles
*→star, vendor-specific abbreviations (pha→phase,mag→magnitude), and context-aware replacements (mtonly maps to magnetization-transfer when not inside "metric") - Branch-based routing: provenance detection runs first and routes multi-output sequences (SWI, SyMRI, EPIMix/NeuroMix, MP2RAGE) into specialized sub-pipelines that override only base contrast and construct — all other axes still run the standard detectors
- SWI branch distinguishes 6 output types (QSM, MinIP, MIP, Phase, SWI Processed, Magnitude) using ImageType flags and text keywords with per-type confidence scores
- SyMRI branch classifies 16+ outputs across quantitative maps (T1map, T2map, PDmap, Myelin, B1map), synthetic weighted images (SyntheticT1w, SyntheticFLAIR, etc.), and raw source components
- EPIMix/NeuroMix branch handles 11 output types with physics-based fallback: uses TI thresholds (T1-FLAIR vs T2-FLAIR), TE ranges (T2*-w vs T2-w), and readout type (EPI vs SSFSE) when text keywords are ambiguous
- Technique detector covers 30+ sequences across SE, GRE, EPI, and MIXED physics families — detection via exclusive flags, combination logic (AND of multiple flags), and keyword fallback
- Modifier detector with mutual exclusion groups: IR contrasts pick highest priority (FLAIR > STIR > DIR > PSIR > IR), trajectory picks one (Radial > Spiral), independent modifiers always additive (FatSat, WaterExcitation, MT)
- Construct detector additively collects derived maps — diffusion (ADC, FA, MD, Trace), perfusion (CBF, CBV, MTT), quantitative (T1map, T2map), SWI (QSM, Phase), projection (MIP, MinIP, MPR), Dixon (Water, Fat, InPhase, OutPhase)
- Acceleration detector with bounded regex to avoid false positives — e.g.
\bmb\dmatches "mb2" but not "combat" - Body part detector classifying brain, spine, neck, and brain-neck from DICOM keywords, used for BIDS directory naming (SC_ prefix for spinal cord)
- Intent synthesis maps detected axes to BIDS directory types (anat/dwi/func/fmap/perf/misc) using a priority chain: provenance → construct → functional keywords → base+modifier → fallback
- 55+ unified boolean flags extracted from DICOM headers, scanner-agnostic
-
Confidence tracking per axis — stacks below 0.6 are automatically flagged for manual review
-
Sorting Pipeline Rebuild — Four-step pipeline with independent execution, typed handovers, and real-time progress streaming
- Step 1 (Checkup): validates subjects/studies, repairs missing study dates from 4 fallback sources (series_date → acquisition_date → content_date → UID date extraction), filters by modality (MR/CT/PET), supports incremental mode (skip already-classified series)
- Step 2 (Stack Fingerprint): single JOIN query loads all stack data, Polars vectorized transforms compute FOV, orientation confidence, text/contrast search blobs, manufacturer normalization; bulk COPY + UPSERT in 50K-row batches — 10-100x faster than v0.2 row-by-row approach
- Step 3 (Classification): runs the classification engine on each fingerprint in batches of 1000, bulk upserts results to
series_classification_cache - Step 4 (Completion): 5-phase post-processing — normalizes field strength (handles Gauss scale, ±tolerance), flags low orientation confidence (<0.85), fills missing 2D/3D acquisition type from scan options/text/technique inference, fills missing base/technique via physics-similarity matching against all previously classified stacks in the database (binned by TR/TE/TI/FA/slice count), re-routes newly-detected SWI through the SWI branch, re-synthesizes intent for stacks stuck in "misc"
- Handover mechanism: each step produces a typed dataclass persisted to
nils_dataset_pipeline_step— stores IDs (not full data) so downstream steps re-query fresh state - Step-wise execution: any step can run independently by loading the previous step's persisted handover, enabling re-runs with different config without starting from scratch
- Preview mode: run steps without committing results to database
-
SSE progress streaming with rolling 100-line log buffer displayed in frontend
-
Quality Control Pipeline — Full QC review system with draft-based workflow and DICOM viewer
- Axes QC Page: image-centric view with Cornerstone.js WebGL rendering, HUD overlays showing acquisition parameters (TE/TR/TI/FA/ImageType) and current classification, keyboard navigation (arrow keys to browse stacks, number keys to select correction options)
- QC Viewer Page: three-level hierarchy (subjects → sessions → stacks) with searchable subject list, sessions grouped by study date, stacks grouped by intent with provenance sub-grouping (SyMRI in purple, SWI in green, EPIMix in orange)
- Draft-based workflow: changes saved to app_db as drafts, not touching metadata_db until user confirms — discard reverts everything, confirm pushes all drafts atomically and clears manual_review_required flag
- Rules engine with 9 configurable rules: TechniqueFamilyMismatch (validates physics family), BrainAspectRatio (flags elongated FOV on brain scans), SpineAspectRatio, LocalizerSliceCount (>20 slices suspicious), ProvenanceMismatch (SWI constructs must have SWI provenance), ContrastUndetermined (T1w without known gadolinium status), BaseMissing
- 5 flag severities: missing (red), conflict (orange), low_confidence (yellow), ambiguous (purple), review (gray) — priority scoring determines QC item ordering
-
Dynamic filtering by axis and flag type, with filters only showing options that have items
-
BIDS Export — Background job processing with cross-cohort resolution and field strength filtering
- Stack naming includes body part prefix (SC_, Neck, BrainNeck), orientation (Ax/Cor/Sag), base contrast, 2D/3D, modifiers, technique, acceleration, constructs, and contrast suffix (_CE)
- DWI stacks self-describe:
Ax_DWI_EPI_b1000_AP_32dirincludes b-value, phase encoding direction, and number of gradient directions extracted from vendor-private DICOM tags (Siemens, GE, Philips) - Multi-stack series handling: echo suffixes (_e1, _e2), TI suffixes (_ti1, _ti2), plus collision resolution with numbered suffixes only when names actually collide
- Cross-cohort DICOM path resolution: when a subject's files live in a different cohort's dcm-raw folder, the exporter falls back through all known cohort paths
- SQL-level field strength filtering (0.5/1.0/1.5/3.0/7.0T) — avoids loading irrelevant stacks from large databases
- Provenance filtering with allow-list/block-list: include specific provenances (SyMRI, SWIRecon) or exclude others (ProjectionDerived)
completed_with_warningsjob status — exports that succeed but had skipped files or NIfTI conversion errors are flagged separately from clean completions- Standalone export system: define a manifest (CSV/JSON) of subject/session/stack selections, resolve to stack IDs, run as a named export job independent of any cohort
sub-prefix guard preventssub-sub-XXXXdirectory names when PatientID already contains the prefix-
Process pool uses
spawncontext (notfork) to avoid virtual memory exhaustion on systems with strict overcommit -
Data Import System — CSV import for 10+ entities with preview/apply pattern
- Subjects, cohorts, subject-cohort memberships, subject identifiers (alternative IDs)
- Events with observation type registry and imaging session backfill
- Diseases, disease types, subject diseases, subject disease types — full longitudinal clinical metadata
- Every import has a preview endpoint (dry-run with sample rows + validation errors) and an apply endpoint (commits to database)
-
Validation: column existence, type coercion, required field enforcement, foreign key checking, duplicate handling policy (skip/update/error)
-
Database Management UI — Tabbed interface organizing tables by domain (subjects, events, clinical measures, imaging, system)
- Identifier type creation with validation
- Application and metadata backup/restore with optional user notes
- 3-phase parallel restore: schema (sequential) → data (parallel, 4 workers) → indexes/constraints (parallel)
-
Post-restore migrations automatically apply schema changes from newer versions to older backups
-
Authentication — Token-based middleware with login page and asset caching
- Podman Support —
--podmanflag inmanage.shwith:ZSELinux labels on all volume mounts - Docker Health Checks — Startup ordering with health check dependencies to prevent connection errors
Changed¶
- Backend Architecture —
server.pyrefactored from ~2000 lines to ~120 lines - 12+ route modules (cohorts, imports, backups, jobs, qc, export, csv, metadata, database, system, etc.)
api/schemasrenamed toapi/models, utility functions extracted toapi/utils/- GZip middleware added (60-80% response size reduction)
- Stack Creation — Stacks now created during extraction via per-instance signature hashing, eliminating the 30-minute post-extraction UPDATE bottleneck
- Signature computed from series UID + modality-specific fields (echo time, inversion time, flip angle, b-value, orientation, image type)
- Cache tracks signature → stack_id; new signature triggers immediate stack row creation
- Extraction Performance — Adaptive batch sizing based on execution timing (target: 1000ms/batch), series-level processing with configurable worker pool, comprehensive MRI/CT/PET field mappings
- Database — Date and time columns migrated from text to native PostgreSQL types; new indexes on frequently queried columns; API response caching
- Anonymization — V2 pipeline with compression, audit resume capability, multiprocessing with streaming, leaf event management
- Frontend — Complete redesign with NILS branding, dark-theme-first flat design, Mantine UI components; cohort detail page with pipeline stage stepper and run button loading state; job center with cohort-centric view and progress animations
- Observation Types — Event types and clinical measure types unified into a single observation type taxonomy
- SWI/Provenance Reclassification — SWI reclassified as provenance (not technique), Radial/Spiral reclassified as modifiers (not standalone techniques)
- Semantic Token Map — Expanded to v1.2.0 with vendor-specific mappings for Swedish/Scandinavian protocols (
gdsuffixes,da-flfor FLAIR,direkt,syntetisk)
Fixed¶
- Series time formatting —
series_timeserialized as ISO string, preventing sorting/export crashes when datetime.time objects were passed raw - Extraction writer crash handling — Writer task crash now surfaces immediately instead of hanging forever on an undrainable queue
- Missing measurement values — Event import preview handles NULL measurements gracefully
- Semantic normalizer — Correctly tokenizes
+/-characters (contrast notation like+Gd) and improvesmp2ragekeyword matching - Diffusion b-values — Upper bound validation filters garbage data from vendor-private tags
- Localizer detection priority — Scout MPR reformats correctly classified before other detectors claim them
- SWI classification — Robust ImageType-based detection prevents technique override from misclassifying SWI outputs
- Bulk classification OOM — Batched upserts prevent PostgreSQL out-of-memory on large cohorts (450K+ stacks)
- Study date format — Hyphens removed for consistent BIDS session naming (
ses-20250315notses-2025-03-15) - PostgreSQL cast syntax —
CAST()used instead of::for migration compatibility - Migration transaction nesting — Per-index transactions with column validation prevent partial migration failures
- Warning handling — Cohort path resolution no longer raises on non-critical warnings
- CHOKIDAR polling — Enabled in Docker to resolve inotify issues causing phantom file change detection
[0.2.1] - 2025-12-29¶
Fixed¶
- BIDS Export Memory Error: Fixed
[Errno 12] Cannot allocate memorywhen running NIfTI conversion - Affected systems with strict memory overcommit (
vm.overcommit_memory=2) - Now uses
spawninstead offorkfor process pool to avoid virtual memory exhaustion - Database Restore Failures: Fixed
pg_restorefailing due to foreign key constraint errors - Tables are now dropped in correct dependency order before restore
Changed¶
- Cohort Detail API Performance: Response payload reduced by ~90%
- Job history now uses slim serialization (full details available via
/jobs/{id}) - Metrics cache extended from 30 seconds to 2 minutes
- Frontend Code Organization: Extracted BIDS configuration into dedicated component
- Centralized status colors and configuration
- Added cohort prefetching on hover for faster navigation
[0.2.0] - 2025-12-25¶
Added¶
- Extraction Retry with Exponential Backoff: Transient database errors (OOM, timeouts) now trigger automatic retry
- Retries indefinitely until all data is written - never skips data
- Adaptive batch size reduction during memory pressure
- Initial delay of 2s, max delay capped at 2 minutes
- Periodic Cache Pruning: In-memory lookup caches are pruned during long-running extractions
- Prevents unbounded memory growth over multi-day extractions (previously could reach several GB)
- Prunes after every 100 subjects processed
- Orphaned Job Recovery on Startup: Jobs that were running when backend crashed/restarted are now marked as failed
- Clear error message explaining the interruption and how to resume
- Enables resume from where extraction left off
- Metrics Caching: Cohort metrics cached for 30 seconds to avoid repeated expensive COUNT queries
- Fast approximate counts using PostgreSQL statistics for instant response
- Cache invalidation after extraction completes
- Parents-First Write Pattern: New insertion strategy that prevents orphan database records
- Pre-filters duplicates before creating parent records (subject/study/series)
- Eliminates dead rows from PostgreSQL MVCC overhead (~50% storage savings on large extractions)
- Comprehensive test suite validates no orphan records are created
- Database Foreign Key Constraints: Added explicit FK constraints with CASCADE delete
- Ensures referential integrity across subject → study → series → instance hierarchy
- Frontend Query Garbage Collection: Unused cached queries now garbage collected after 5 minutes
Fixed¶
- PostgreSQL Out-of-Memory During Large Extractions (30M+ instances)
- Reduced work memory from 256MB to 32MB per query
- Disabled parallel query workers during extraction
- Added 48GB memory limit to metadata database container
- Increased shared memory allocation to 4GB
- Memory Growth in Extraction Writer
- Eliminated reverse lookup cache that could grow to ~850MB for large cohorts
- Stack queries now use efficient JOIN instead of in-memory lookup
- Frontend Memory Growth
- Removed aggressive polling on cohorts list (now manual refresh)
- Reduced job list polling from 5s to 15s
- Disabled automatic polling on administrative pages (backups, database info)
- Disabled polling on health/readiness endpoints
- Cohort Detail API Performance: Metrics now fetched once and reused for all job history entries
- Modality Details Conflict Handling: Fixed edge case where series processed after rollback could fail
Changed¶
- PostgreSQL Configuration optimized for large extraction workloads
- Shared buffers: 2GB → 4GB
- Work memory: 256MB → 32MB (conservative for concurrent writes)
- Effective cache size: 4GB → 32GB
- Added connection limit of 50
- Added query timeout of 120s to kill runaway queries
- Added idle transaction timeout of 5 minutes
- Added slow query logging (>10s)
- Frontend Independence: Frontend container no longer waits for backend to be healthy
- Prevents frontend restarts from interrupting long-running backend extraction jobs
- Frontend gracefully handles backend unavailability
- Production Build Optimization: Removes debugger statements and console.log in production builds
[0.1.0] - 2025-12-18¶
Added¶
- Initial release of NILS - Neuroimaging Intelligent Linked System
- DICOM Classification System: Rule-based classification with YAML configuration
- Base sequence detection (T1w, T2w, FLAIR, DWI, etc.)
- Technique detection (acceleration, contrast, orientation)
- Special case handling (EPIMix, SWI, SyMRI, Dixon, MP2RAGE)
- Sorting Pipeline: Automated DICOM organization and file management
- Pseudo-anonymization: Secure patient data de-identification
- Metadata Extraction: DICOM tag extraction and CSV/Excel import
- BIDS Export: Brain Imaging Data Structure compliant export
- Quality Control: Visual QC workflow with DICOM viewer integration
- Web Interface: React-based UI with dark theme
- Dashboard overview
- Database browser
- Cohort management
- Job monitoring
- Docker Compose Deployment: Containerized full-stack application
- Dual Database System: Separate application and metadata PostgreSQL databases