Skip to content

main/gcc: add missed -O2 into CXX/CPP FLAGS

Ghost User requested to merge (removed):gcc-o2 into master

these increase the speed of compilation by around 10% on x86_64

some light benchmarking:

# kcbench rev 095a98308c415bc62e2a8c4a0179ce92f80b5b6a
# https://gitlab.com/knurd42/kcbench
# + apk add -t .kcbench bash coreutils time util-linux-misc
# + linux-lts abuild deps
# + maybe missed one or two deps
# nld5-dev1.alpinelinux.org
# 2x Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
# kcbench -i 3 -s 5.15.55 --add-make-args "AWK=mawk"
  • gcc-11.2.1_git20220219-r4 (repo, aports/64e502ae)
Processor:           Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz [48 CPUs]
Cpufreq; Memory:     schedutil [intel_cpufreq]; 257848 MiB
Linux running:       5.15.11-0-lts [x86_64]
Compiler:            gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219
Linux compiled:      5.15.55 [/home/demon/.cache/kcbench/linux-5.15.55/]
Config; Environment: defconfig; CCACHE_DISABLE="1"
Build command:       make AWK=mawk vmlinux
Filling caches:      This might take a while... Done
Run 1 (-j 48):       133.40 seconds / 26.99 kernels/hour [P:3687%, 1604 maj. pagefaults]
Run 2 (-j 48):       130.63 seconds / 27.56 kernels/hour [P:3753%, 1514 maj. pagefaults]
Run 3 (-j 48):       128.27 seconds / 28.07 kernels/hour [P:3816%, 1567 maj. pagefaults]
Run 4 (-j 54):       129.01 seconds / 27.90 kernels/hour [P:3804%, 1565 maj. pagefaults]
Run 5 (-j 54):       128.35 seconds / 28.05 kernels/hour [P:3822%, 1608 maj. pagefaults]
Run 6 (-j 54):       129.65 seconds / 27.77 kernels/hour [P:3787%, 1681 maj. pagefaults]
Run 7 (-j 24):       161.68 seconds / 22.27 kernels/hour [P:2020%, 1425 maj. pagefaults]
Run 8 (-j 24):       159.91 seconds / 22.51 kernels/hour [P:2025%, 1306 maj. pagefaults]
Run 9 (-j 24):       154.73 seconds / 23.27 kernels/hour [P:2044%, 1350 maj. pagefaults]
Run 10 (-j 29):      149.97 seconds / 24.00 kernels/hour [P:2416%, 1481 maj. pagefaults]
Run 11 (-j 29):      148.50 seconds / 24.24 kernels/hour [P:2440%, 1329 maj. pagefaults]
Run 12 (-j 29):      149.97 seconds / 24.00 kernels/hour [P:2428%, 1323 maj. pagefaults]
  • gcc-11.2.1_git20220219-r5 (adding missing -O2 CPP/CXX FLAGS)
Processor:           Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz [48 CPUs]
Cpufreq; Memory:     schedutil [intel_cpufreq]; 257848 MiB
Linux running:       5.15.11-0-lts [x86_64]
Compiler:            gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219
Linux compiled:      5.15.55 [/home/demon/.cache/kcbench/linux-5.15.55/]
Config; Environment: defconfig; CCACHE_DISABLE="1"
Build command:       make AWK=mawk vmlinux
Filling caches:      This might take a while... Done
Run 1 (-j 48):       117.99 seconds / 30.51 kernels/hour [P:3728%, 1555 maj. pagefaults]
Run 2 (-j 48):       121.80 seconds / 29.56 kernels/hour [P:3624%, 1645 maj. pagefaults]
Run 3 (-j 48):       120.24 seconds / 29.94 kernels/hour [P:3667%, 1362 maj. pagefaults]
Run 4 (-j 54):       118.63 seconds / 30.35 kernels/hour [P:3725%, 1701 maj. pagefaults]
Run 5 (-j 54):       117.88 seconds / 30.54 kernels/hour [P:3743%, 1680 maj. pagefaults]
Run 6 (-j 54):       118.00 seconds / 30.51 kernels/hour [P:3737%, 1587 maj. pagefaults]
Run 7 (-j 24):       141.66 seconds / 25.41 kernels/hour [P:2028%, 1500 maj. pagefaults]
Run 8 (-j 24):       142.17 seconds / 25.32 kernels/hour [P:2019%, 1593 maj. pagefaults]
Run 9 (-j 24):       147.63 seconds / 24.39 kernels/hour [P:2023%, 1337 maj. pagefaults]
Run 10 (-j 29):      138.18 seconds / 26.05 kernels/hour [P:2403%, 1471 maj. pagefaults]
Run 11 (-j 29):      136.01 seconds / 26.47 kernels/hour [P:2411%, 1379 maj. pagefaults]
Run 12 (-j 29):      136.85 seconds / 26.31 kernels/hour [P:2398%, 1347 maj. pagefaults]
  • gcc-11.2.1_git20220219-r6 (above with --with-build-config=bootstrap-lto to use lto for compiler itself)
Processor:           Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz [48 CPUs]
Cpufreq; Memory:     schedutil [intel_cpufreq]; 257848 MiB
Linux running:       5.15.11-0-lts [x86_64]
Compiler:            gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219
Linux compiled:      5.15.55 [/home/demon/.cache/kcbench/linux-5.15.55/]
Config; Environment: defconfig; CCACHE_DISABLE="1"
Build command:       make AWK=mawk vmlinux
Filling caches:      This might take a while... Done
Run 1 (-j 48):       115.78 seconds / 31.09 kernels/hour [P:3668%, 1584 maj. pagefaults]
Run 2 (-j 48):       115.67 seconds / 31.12 kernels/hour [P:3670%, 1517 maj. pagefaults]
Run 3 (-j 48):       116.38 seconds / 30.93 kernels/hour [P:3648%, 1664 maj. pagefaults]
Run 4 (-j 54):       113.78 seconds / 31.64 kernels/hour [P:3735%, 1700 maj. pagefaults]
Run 5 (-j 54):       114.07 seconds / 31.56 kernels/hour [P:3721%, 1676 maj. pagefaults]
Run 6 (-j 54):       114.27 seconds / 31.50 kernels/hour [P:3716%, 1666 maj. pagefaults]
Run 7 (-j 24):       137.75 seconds / 26.13 kernels/hour [P:1996%, 1324 maj. pagefaults]
Run 8 (-j 24):       141.43 seconds / 25.45 kernels/hour [P:2007%, 1287 maj. pagefaults]
Run 9 (-j 24):       142.14 seconds / 25.33 kernels/hour [P:2004%, 1360 maj. pagefaults]
Run 10 (-j 29):      131.34 seconds / 27.41 kernels/hour [P:2407%, 1510 maj. pagefaults]
Run 11 (-j 29):      131.43 seconds / 27.39 kernels/hour [P:2389%, 1361 maj. pagefaults]
Run 12 (-j 29):      131.95 seconds / 27.28 kernels/hour [P:2381%, 1431 maj. pagefaults]

note the time includes things that are not just cc running, so the changes to CC alone are larger than that 'percentage'.

lto also is a little faster, and most likely quite safe here, so could be enabled when not bootstrapping/cross for simplicity in the general toolchain, but the gains are not that big for x86_64 in that test. maybe the other arches tell a different story.

-j54 runs specifically:

-r4: =(129.01+128.35+129.65)/3 -> 129.00
-r5: =(118.63+117.88+118.00)/3 -> 118.17 (-8.4%)
-r6: =(113.78+114.07+114.27)/3 -> 114.04 (-11.6%)

assuming ~90% of the time is spent in cc the 8.4% change is about ~10% for it alone. not very scientific, but close enough :)

Edited by Ghost User

Merge request reports