Skip to content

Copying Large Data

Transferring large volumes of data, such as terabytes (TB) of information, requires efficient methods to ensure speed, reliability, and security. This document analyzes three primary methods: using cp, utilizing rsync, and employing tar, providing code examples for each and discussing how to run these commands in the background to prevent interruption during SSH or shell disconnections.

Utilizing cp

The cp command is a basic Linux utility for copying files and directories. It's simple to use but may not be the most efficient for large data volumes.

cp -R /path/to/SOURCE_FOLDER/* /path/to/DESTINATION_FOLDER/
  • Pros: Simple and straightforward.
  • Cons: Not efficient for large data volumes; lacks incremental copying.

Utilizing rsync

rsync is designed for synchronizing files and directories efficiently, supporting incremental backups and file synchronization.

rsync -a /path/to/SOURCE_FOLDER/ /path/to/DESTINATION_FOLDER/
  • Pros: Incremental backups, efficient synchronization.
  • Cons: Slightly complex.

Utilizing tar

Using tar involves archiving the source directory and extracting it to the destination, potentially optimizing the transfer process.

cd /path/to/SOURCE_FOLDER; tar cf - . | (cd /path/to/DESTINATION_FOLDER; tar xvf -)
  • Pros: Super fast for large datasets.
  • Cons: Complex command, not intuitive.