Data compression is the process of storing data in a format that uses less space than the original representation would use. Compressing data can be very useful particularly in the field of communications as it enables devices to transmit or store data in fewer bits. Besides reducing transmission bandwidth, compression increases the amount of information that can be stored on a hard disk drive or other storage device.
There are two main types of compression. Lossy compression is a data encoding method which reduces a file by discarding certain information. When the file is uncompressed, not all of the original information will be recovered. Lossy compression is typically used to compress video, audio and images, as well as internet telephony. The fact that information is lost during compression will often be unnoticeable to most users. Lossy compression techniques are used in all DVDs, Blu-ray discs, and most multimedia available on the internet.
However, lossy compression is unsuitable where the original and the decompression data must be identical. In this situation, the user will need to use lossless compression. This type of compression is employed in compressing software applications, files, and text articles. Loseless compression is also popular in archiving music. This article focuses on lossless compression tools.
Popular lossless compression tools include gzip, bzip2, and xz. When compressing and decompressing files these tools use a single core. But these days, most people run machines with multi-core processors. You won’t see the speed advantage modern processors offer with the traditional tools. Step forward modern compression tools that use all the cores present on your system when compressing files, offering massive speed advantages.
Some of the tools covered in this article don’t provide significant acceleration when decompressing compressed files. The ones that do offer significant improvement, using multiple cores, when decompressing files are pbzip2, lbzip2, plzip, and lrzip.
Let’s check out the multi-core compression tools. See our time and size charts. And at the end of each page, there’s a table with links to a dedicated page for each of the multi-core tools setting out, in detail, their respective features.
Learn more about the features offered by the multi-core compression tool. We’ve compiled a dedicated page for each tool explaining, in detail, the features they offer.
Multi-Core Compression Tools | |
---|---|
pigz | Parallel implementation of gzip. It’s a fully functional replacement for gzip |
PBZIP2 | Parallel implementation of the bzip2 block-sorting file compressor |
PXZ | Runs LZMA compression on multiple cores and processors |
lbzip2 | Parallel bzip2 compression utility, suited for serial and parallel processing |
plzip | Massively parallel (multi-threaded) lossless data compressor based on lzlib |
lrzip | Compression utility that excels at compressing large files |
pixz | Parallel indexing XZ compression, fully compatible with XZ. LZMA and LZMA2 |
With Default Compression
Default compression refers to running the compression tool without any compression flag being applied.
pigz compressed our 537MB tarball on our quad-core machine in the quickest time of all the tools, completing the test in a swift 4 seconds. To put the result into some context, we also ran the same test using gzip, which compressed the file in 14.7 seconds. pigz therefore fell a bit short of being 4x quicker.
You’ll notice lbzip2 and pbzip2 bars are colored red. This is because these tools use the best compression as their default.
pigz compressed the 537MB tarball down to 110MB. lrzip offers the best compression ratio for the tarball, squeezing it down to a frugal 64MB, although there isn’t much difference between lrzip, pxz, pixz, or plzip.
Again lbzip2 and pbzip2 bars are colored red. This is because these tools use the best compression as their default.
Methodology used for the tests
We took a 537MB tarball of a popular source package. The tarball was copied to RAM (/dev/shm), and the tests ran in RAM on a quad-core CPU without hyper-threading (Core i5-2500K), with no X server running, and under negligible load.
Each test was run three times with the latest version (at the time of writing) of each multi-core compression tool. The average results are recorded in the charts above. The tests show the relative difference between the multi-core compression tools. They are for indicative purposes only.
With Fastest Compression
Most of the tools provide a flag to set the level of compression on a scale from 1 to 9. pxz and plzip and pixz scale from 0 to 9. This test uses the lowest available compression option.
All of the multi-core tools made fairly light work of compressing the tarball with their fastest compression option.
If you need to compress large files on a machine with a low powered multi-core machine, the fastest compression might be suitable. pigz compressed the 537MB tarball to 134MB in a whisker under 1.7 seconds. Most of the other tools shaved the tarball to around 100MB, and lrzip compressed the file to a mere 90MB.
Methodology used for the tests
We took a 537MB tarball of a popular source package. The tarball was copied to RAM (/dev/shm), and the tests ran in RAM on a quad-core CPU without hyper-threading (Core i5-2500K), with no X server running, and under negligible load.
Each test was run three times with the latest version (at the time of writing) of each multi-core compression tool. The average results are recorded in the charts above. The tests show the relative difference between the multi-core compression tools. They are for indicative purposes only.
With Best Compression
The time taken to complete this test using the best compression option varies significantly within our group of tools. The fastest software is lbzip2 and pigz, both completing the task in under 9 seconds.
plzip is the slowest of the group taking nearly 100 seconds.
If space is paramount, pxz, plzip, pixz and lrzip offer impressive compression ratios. But pxz and pixz are quicker to complete.
Recent versions of pigz include the Zopfli engine which achieves higher compression than other zlib implementations but takes much longer to complete the compression. pigz uses Zopfli with the -11 flag.
Using Zopfli, pigz takes a whopping 14 minutes 25 seconds to complete the test. And while the compression ratio is better, the compressed file weights in at 104MB (as opposed to 109MB with the -9 flag). That’s still larger than the output from most of the multi-core tools with their fastest compression option.
pxz also has an extreme option which is triggered with the -e flag. Using the extreme option is designed to improve the compression ratio by using more CPU time. But compression ratio wasn’t improved in our tests. With the -9 flag, the tarball is 62MB. Yet, using the -e flag, the tarball was 65MB. We’ll need to run more tests to determine if this is just an anomaly.
Methodology used for the tests
We took a 537MB tarball of a popular source package. The tarball was copied to RAM (/dev/shm), and the tests ran in RAM on a quad-core CPU without hyper-threading (Core i5-2500K), with no X server running, and under negligible load.
Each test was run three times with the latest version (at the time of writing) of each multi-core compression tool. The average results are recorded in the charts above. The tests show the relative difference between the multi-core compression tools. They are for indicative purposes only.
Long Range Zip (Lrzip)
Lrzip uses an extended version of rzip, which does a first pass long distance redundancy reduction. It uniquely offers a good range of compression methods:
- LZMA (the default algorithm) – this is the Lempel–Ziv–Markov chain algorithm.
- ZPAQ – designed for user-level backups.
- LZO – Lempel–Ziv–Oberhume.
- gzip – based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding.
- bzip2 – compression program that uses the Burrows–Wheeler algorithm.
When it comes to the size of the compressed tarball, zpaq offers the best compression.
Methodology used for the tests
We took a 537MB tarball of a popular source package. The tarball was copied to RAM (/dev/shm), and the tests ran in RAM on a quad-core CPU without hyper-threading (Core i5-2500K), with no X server running, and under negligible load.
Each test was run three times with the latest version (at the time of writing) of each multi-core compression tool. The average results are recorded in the charts above. The tests show the relative difference between the multi-core compression tools. They are for indicative purposes only.