Usage
The DICOMetaExtractor_v32.py
script is designed to be used from the command line interface (CLI). The basic usage pattern is outlined below:
Arguments
<path_to_dicom_directory>
: This is the path to the root directory containing your DICOM files. The script will recursively search this directory and its subdirectories for DICOM files to process.-o <output_csv_file_path>
: (Optional) Path where the extracted metadata CSV file will be saved. If not specified, the script defaults todicom_data.csv
in the current directory.
Example
To extract metadata from DICOM files located in /path/to/dicom/files
and save the output to extracted_metadata.csv
in the current working directory, run:
Info
- The script handles large datasets by processing files in parallel and managing memory usage efficiently. It creates temporary files during processing, which are automatically cleaned up upon completion.
- An internet connection is required for the initial installation of dependencies but not for running the script on local DICOM files.
Customizing Number of Workers
The DICOMetaExtractor_v32.py
script uses parallel processing to improve efficiency, especially when working with large datasets. By default, the script dynamically allocates a certain number of worker processes to optimize performance based on your system's capabilities. However, you might find it necessary to adjust the number of workers manually to better match your system's resources or to optimize the script's performance for your specific dataset.
Adjusting Worker Processes
To customize the number of worker processes used by the script, you will need to modify the source code slightly. This involves changing the max_workers
parameter in the ProcessPoolExecutor
and potentially the ThreadPoolExecutor
, depending on where you want to adjust the parallelism.
-
Open the script in your preferred text editor or Integrated Development Environment (IDE).
-
Find the
ProcessPoolExecutor
instantiation. Look for the following line in thecollect_and_process_dicom_data
function:
- Modify the
max_workers
parameter to reflect the number of worker processes you wish to use. For example, to use 8 workers, change the line to:
- (Optional) Adjust ThreadPoolExecutor: If you also wish to change the number of threads used for directory scanning, find the
ThreadPoolExecutor
instantiation in thefind_dcm_folders
function and adjust themax_workers
parameter similarly.
- Save your changes and close the file.
Guidelines for Choosing the Number of Workers
-
CPU Resources: The optimal number of worker processes usually correlates with the number of CPU cores available on your system. Setting
max_workers
to the number of cores or logical processors can maximize your CPU usage. -
Memory Constraints: Be mindful of your system's memory (RAM). Increasing the number of workers increases memory usage. Monitor your system's memory usage and adjust the number of workers to prevent exhausting system resources.
-
Disk I/O: For disk-bound tasks, such as reading DICOM files from a slow disk, increasing the number of workers might not lead to performance improvements. In such cases, disk speed is the limiting factor.
-
Trial and Error: Finding the optimal setting may require some experimentation. Start with a number close to your system's CPU core count and adjust based on observed performance and system resource usage.
After adjusting the number of workers, run the script as usual to process your DICOM files with the new configuration. This customization allows you to tailor the script's performance to your specific system and dataset characteristics, optimizing efficiency and resource utilization.