Anonymization¶
Anonymization (pseudo-anonymization) de-identifies DICOM files for research use. NILS provides comprehensive privacy protection with audit logging.
What Anonymization Does¶
- DICOM Tag Scrubbing - Removes or modifies 80+ patient-identifiable tags
- Patient ID Remapping - Multiple strategies for ID substitution
- Date Manipulation - Maps study dates to relative timepoints
- Audit Logging - Records all modifications for compliance
- Folder Renaming - Renames directories to new IDs
- Output Export - Generates CSV/Excel audit reports
Tag Categories¶
Tags are organized into categories that can be enabled/disabled:
Patient Information¶
- Patient Name
- Patient Birth Date
- Patient Age
- Patient Address
- Patient Phone
- Patient Email
- Patient Occupation
- Patient ID numbers
Clinical Trial Information¶
- Protocol ID
- Enrollment Date
- Subject Number
- Site Information
Healthcare Provider Information¶
- Physician Names
- Referring Physician
- Performing Physician
- Department
- Facility Codes
Institution Information¶
- Institution Name
- Institution Address
- Institution Contact
Time and Date Information¶
- Study Date/Time
- Series Date/Time
- Acquisition Date/Time
- Content Date/Time
Patient ID Strategies¶
NILS supports five patient ID remapping strategies:
1. NONE¶
No ID remapping. Original Patient IDs are preserved.
Warning
Use only when original IDs are already anonymized.
2. SEQUENTIAL¶
Discover patients and assign sequential IDs.
| Option | Description |
|---|---|
starting_number |
First ID number (default: 1) |
prefix |
ID prefix (default: "P") |
discovery_mode |
How to discover patients |
Discovery Modes:
per_top_folder- Each top-level folder is one patientone_per_study- Each StudyInstanceUID is one patientall- Scan all DICOM files to discover unique PatientIDs
Example output: P001, P002, P003, ...
3. FOLDER¶
Extract ID from folder path using regex.
| Option | Description |
|---|---|
regex_pattern |
Pattern to extract ID |
folder_depth |
Directory depth to match |
fallback_template |
Template for unmatched folders |
Example:
- Path:
/data/SUBJECT_001/session1/ - Pattern:
SUBJECT_(\d+) - Result: Patient ID = "001"
4. DETERMINISTIC¶
Hash-based consistent ID mapping.
| Option | Description |
|---|---|
salt |
Salt for hash computation |
prefix |
ID prefix |
How it works:
- Uses blake2b hash with configurable salt
- Same original ID always produces same new ID
- Reproducible across runs with same salt
Example: Original "PAT001" → Hash "A3F7B2C1"
5. CSV¶
Load mapping from external CSV file.
| Option | Description |
|---|---|
csv_path |
Path to mapping CSV |
source_column |
Column with original IDs |
target_column |
Column with new IDs |
missing_mode |
How to handle unmapped IDs |
Missing Modes:
SEQUENTIAL- Assign sequential ID to unmappedHASH- Use deterministic hash for unmapped
Study Date Mapping¶
NILS can map actual dates to relative timepoints:
| Original Date | Mapped Value | Meaning |
|---|---|---|
| 2020-01-15 | M00 | Baseline |
| 2020-07-20 | M06 | 6 months |
| 2021-01-18 | M12 | 12 months |
| 2022-01-22 | M24 | 24 months |
How it works:
- First scan date becomes anchor (M00)
- Subsequent dates mapped to nearest timepoint
- Actual dates removed from DICOM
This enables longitudinal tracking without exposing actual dates.
Running Anonymization¶
From Web Interface¶
- Navigate to the cohort
- Click Anonymize
- Configure options:
- Select tag categories to scrub
- Choose ID strategy
- Enable/disable date mapping
- Click Start
Configuration Options¶
| Option | Description | Default |
|---|---|---|
patient_id_strategy |
ID remapping method | SEQUENTIAL |
map_study_dates |
Map to relative timepoints | true |
categories |
Tag categories to scrub | all |
output_dir |
Where to write anonymized files | required |
Audit System¶
NILS maintains comprehensive audit logs:
Per-File Tracking¶
- Tags removed
- Tags modified
- Original → New values
- Processing timestamp
Per-Cohort Summary¶
- Total files processed
- Total tags modified
- Files with errors
- Processing duration
Export Formats¶
- CSV - Plain text audit log
- Excel - Formatted with encryption option
Resume Capability¶
Anonymization supports resumable processing:
- Tracks completed StudyInstanceUIDs
- Skips already-processed studies on restart
- Enables recovery from interruptions
Leaf-Level Granularity¶
Each "leaf" (unique StudyInstanceUID) is tracked:
files_written- Count of files processedfiles_reused- Count of files skipped (already done)errors- Processing errors
Output Structure¶
Anonymized files are written with new structure:
Input:
Output (Sequential IDs):
Output (Folder-based):
Best Practices¶
Before Anonymization¶
- Backup original data - Anonymization modifies files
- Verify extraction - Ensure all data is in database
- Review ID strategy - Choose appropriate method for your study
During Anonymization¶
- Monitor progress - Check Jobs tab for status
- Check errors - Review any failed files
- Verify output - Spot-check anonymized files
After Anonymization¶
- Review audit log - Verify all expected tags were removed
- Validate output - Check files can still be read
- Secure mapping file - Protect ID mapping for re-identification
Security Considerations¶
Protect the Mapping
The ID mapping file (for SEQUENTIAL, CSV strategies) can be used to re-identify subjects. Store it securely and separately from anonymized data.
Pseudo-anonymization
NILS performs pseudo-anonymization, not full anonymization. Data can be re-linked using the mapping file if properly authorized.
What's NOT Removed¶
- Imaging pixel data
- Acquisition parameters (TR, TE, etc.)
- Scanner information (can be identifying in small studies)
- Burned-in annotations (if present in pixel data)
For stricter anonymization needs, consider:
- Additional pixel-level processing (face removal)
- Scanner model obfuscation
- Custom tag handling