Linux Today – 62 Benchmarks, 12 Systems, 4 Compilers: Our Most Extensive Benchmarks Yet Of GCC vs. Clang Performance

After nearly two weeks of benchmarking, here is a look at our most extensive Linux x86_64 compiler comparison yet between the latest stable and development releases of the GCC and LLVM Clang C/C++ compilers. Tested with GCC 8, GCC 9.0.1 development, LLVM Clang 7.0.1, and LLVM Clang 8.0 SVN were tests on 12 distinct 64-bit systems and a total of 62 benchmarks run on each system with each of the four compilers… Here’s a look at this massive data set for seeing the current GCC vs. Clang performance.

With the GCC 9 and Clang 8 releases coming up soon, I’ve spent the past two weeks running this plethora of compiler benchmarks on a range of new and old, low and high-end systems within the labs. The 12 chosen systems aren’t meant for trying to compare the performance between processors but rather a diverse look at how Clang and GCC perform on varying Intel/AMD microarchitectures. For those curious about AArch64 and POWER9 compiler performance, that will come in a separate article with this testing just looking at the Linux x86_64 compiler performance.

The 13 systems tested featured the following processors:

– AMD FX-8370E (Bulldozer)
– AMD A10-7870K (Godavari)
– AMD Ryzen 7 2700X (Zen)
– AMD Ryzen Threadripper 2950X (Zen)
– AMD Ryzen Threadripper 2990WX (Zen)
– AMD EPYC 7601 (Zen)
– Intel Core i5 2500K (Sandy Bridge)
– Intel Core i7 4960X (Ivy Bridge)
– Intel Core i9 7980XE (Skylake X)
– Intel Core i7 8700K (Coffeelake)
– Intel Xeon E5-2687Wv3 (Haswell)
– Intel Xeon Silver 4108 (SP Skylake)

The selection was chosen based upon systems in the server room that weren’t pre-occupied with other tests, of interest for a diverse look across several generations of Intel/AMD processors, and obviously based upon the hardware I have available. The storage and RAM varied between the systems, but again the focus isn’t for comparing these CPUs rather seeing how GCC 8, GCC 9, Clang 7, and Clang 8 compare. Ubuntu 18.10 was running on these systems with the Linux 4.18 kernel. All of the compiler releases were built in their release/optimized (non-debug) builds. During the benchmarking process on all of the systems, the CFLAGS/CXXFLAGS were maintained of “-O3 -march=native” throughout.

These compiler benchmarks are mostly focused on the raw performance of the resulting binaries but also included a few tests looking at the compile time performance too. For those short on time and wanting a comparison at the macro level, here is an immediate look at the four-way compiler performance across the dozen systems and looking at the geometric mean of all 62 compiler benchmarks carried out in each configuration:

On the AMD side, the Clang vs. GCC performance has reached the stage that in many instances they now deliver similar performance… But in select instances, GCC still was faster: GCC was about 2% faster on the FX-8370E system and just a hair faster on the Threadripper 2990WX but with Clang 8.0 and GCC 9.0 coming just shy of their stable predecessors. These new compiler releases didn’t offer any breakthrough performance changes overall for the AMD Bulldozer to Zen processors benchmarked.

On the Intel side, the Core i5 2500K interestingly had slightly better performance on Clang over GCC. With Haswell and Ivy Bridge era systems the GCC vs. Clang performance was the same. With the newer Intel CPUs like the Xeon Silver 4108, Core i7 8700K, and Core i9 7980XE, these newer Intel CPUs were siding with the GCC 8/9 compilers over Clang for a few percent better performance.

Now onward to the interesting individual data points… But before getting to that, if you appreciate all of the Linux benchmarking done day in and day out at Phoronix, consider joining Phoronix Premium to make this testing possible. Phoronix relies primarily on (pay per impression) advertisements to continue publishing content as well as premium subscriptions for those who prefer not seeing ads. Premium gets you ad-free access to the site as well as multi-page articles (like this!) all on a single page, among other benefits. Thanks for your support and at the very least to not be utilizing any ad-blocker on this web-site. Now here is the rest of these 2019 compiler benchmark results.

With the PolyBench-C polyhedral benchmark, what was interesting to note is that for the most part the Clang and GCC performance across this diverse range of systems was almost identical… But the interesting bit is the Intel Xeon Silver 4108 and Core i9 7980XE CPUs both performing noticeably better with GCC over Clang. Potentially explaining this is those two CPUs have AVX-512 and perhaps better utilized currently on the GCC side.

Of interest with the FFTW benchmark was seeing GCC 8.2 doing much better on the 2700X / 2990WX / EPYC 7601 Zen systems but the performance dropping back with GCC 9.0. On the Intel side, both AVX-512 Core i9 / Xeon Scalable systems saw nice performance improvements over Clang with GCC 8.2 and now moreso with the upcoming GCC 9.1.

The HMMer molecular biology benchmark was interesting in that with a number of systems the Clang performance was better than GCC, but for the older AMD systems and select Intel systems, GCC was still faster. So this case was a mixed bag between the compilers.

MAFFT is bringing better performance on the range of systems tested with GCC 9 compared to the current GCC 8 release, but that largely makes its performance in line with Clang.

The BLAKE2 crypto benchmark was one of the cases where Clang was easily beating out GCC on nearly all of the configurations.

The SciMark2 benchmarks always tend to be quite susceptible to compiler changes and in some cases like Jacobi, GCC is performing much faster than Clang.

Clang was generating faster code over GCC on the twelve systems with the TSCP chess benchmark.

On the AMD Zen systems, the Clang-generated binary for VP9 vpxenc video encoding was slightly faster while the Intel performance was close between these compilers. The exception on the Intel side was the Intel Core i9 7980XE with seeing measurably better performance using GCC.

With the H.264/H.265 video encode tests among other video coding benchmarks there isn’t too much change with most of the programs/libraries relying upon hand-tuned Assembly code already. But in the case of the x265 benchmark, the AVX512-enabled Xeon Silver and Core i9 Skylake-X processors were yielding better performance on GCC.

The OpenMP performance in LLVM Clang has come a long way in recent years and for many situations yields performance comparable to the GCC OpenMP implementation. In the case of GraphicsMagick that makes use of OpenMP, it depended upon the operations being carried out whether GCC still carried a lofty lead or was a neck-and-neck race.

With the Himeno pressure solver, on the AMD side GCC performed noticeably better than Clang with the old Bulldozer era FX-8370E. On the Intel side, GCC tended to outperform Clang particularly with the newer generations of processors.

As for compiler performance in building out some sample projects, compiling Apache was quite close between GCC and Clang but sided in favor of the LLVM-based compiler. When it came to building the ImageMagick program, using Clang led to much quicker build times than GCC. GCC 9 is building slower than GCC 8, which isn’t to much surprise considering the newer compilers tend to tack on additional optimization passes and other work in trying to yield faster binaries at the cost of slower build times.

When it came to the time needed to build LLVM, the Clang compiler was still faster though on the newer Intel CPUs was quite a tight race.

There were also cases where GCC did compile faster than Clang: building out PHP was quicker on GCC than Clang across all of the systems tested.

The C-Ray multi-threaded ray-tracer remains much faster with GCC over Clang on all of the systems tested.

The AOBench ambient occlusion renderer was also faster with GCC.

The dav1d AV1 video decoder was quicker with Clang on the older AMD systems as well as the older Intel (Sandy Bridge / Ivy Bridge) systems while on the newer CPUs the performance between the compilers yielded similar video decode speed.

LAME MP3 audio encoding was faster when built under GCC.

In very common workloads like OpenSSL, its performance has already been studied and well-tuned by all of the compilers for the past number of years.

Redis was faster on the newer (AVX-512) CPUs with GCC where as on the other systems the performance was similar.

Interestingly in the case of Sysbench, the AMD performance was faster when built by the GCC compiler while the Intel systems performed much better with the Clang compiler.

Broadly, it’s a very competitive race these days between GCC and Clang on Linux x86_64. As shown by the geometric means for all these tests, the race is neck-and-neck with GCC in some cases just having a ~2% advantage. Depending upon the particular code-base, in some cases the differences were more pronounced. One area where GCC seemed to do better on average than Clang was with the newer Core i9 7980XE and Xeon Silver systems that have AVX-512 and there the GNU Compiler Collection most often outperformed Clang. In the tests looking at the compile times, Clang still had some cases of beating out GCC but with some of the build tests the performance was close and in the case of compiling PHP it was actually faster to build on GCC.

Those wishing to dig through the 62 benchmarks across the dozen systems and the four compilers can find all of the raw performance data via this OpenBenchmarking.org result file. And if you appreciate all of our benchmarking, consider going premium.

Source

Leave a Reply Cancel reply