Migrating from docker-library the new python3 package bundled in Alpine3.12, an automated performance test shows a clear degradation.
The test consists in serializing complex JSON messages with orjson and sending them over plain TCP.
The receiver side is outside of the container and always the same.
Performance degrades from ~200,000 messages per second with python3.7:alpine3.10 (or python3.8:alpine3.12) to ~180,000 with alpine:3.12 + apk add python3.
Profiling with py-spy all flame charts seem very similar.
The hot spots in the code are:
~37% on TCP socket.sendall() (non-asyncio) => ~38% on the faster version
~30% on JSON serialization (with orjson) => 32% on the faster version
In both cases I used the same orjson 3.3.1 binary from manylinux wheel, as well as building different versions from scratch with same Rust version and commands.
Since the profile shapes are so similar in both cases, I am wondering if there is a general performance drop in the interpreter itself...
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
gatopeichchanged title from Boot sequence performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12) to Boot sequence% performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12)
changed title from Boot sequence performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12) to Boot sequence% performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12)
@gatopeich reword to remove that combination of the tilde. "Around 10%" perhaps?
gatopeichchanged title from ~10% performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12) to Performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12)
changed title from ~10% performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12) to Performance degradation in Python 3.8 compared to the one from docker-library (python3.8:alpine3.12)
3462e07c is in 3.12 an later. So you can compare the binaries in 3.11 and 3.12 to see if it has an effect. It was fixed for me when I built the binary locally (!6945 (comment 83258)).
What about the idea of adding a python3 subpackage python3-optimised so that the usual python3 package is compiled with "-Os" (and installed by default via alpine-base) and python3-optimised is compiled with "-O2" and can be installed to replace python3 if wished?
Has anybody checked the alleged size savings of -Os vs -O2 or
-O3?
It might turn out small or even negligible...
I just built locally the Python3 package from Edge with the only change being replacing "-Os" with "-O2".
ls -l --block-size=MB-rw-r--r-- 1 dermot users 13MB Dec 6 18:04 python3-3.8.6-r1.apk-rw-r--r-- 1 dermot users 6MB Dec 6 18:03 python3-dbg-3.8.6-r1.apk-rw-r--r-- 1 dermot users 25MB Dec 6 18:03 python3-dev-3.8.6-r1.apk-rw-r--r-- 1 dermot users 1MB Dec 6 18:03 python3-doc-3.8.6-r1.apk-rw-r--r-- 1 dermot users 17MB Dec 6 18:04 python3-tests-3.8.6-r1.apk-rw-r--r-- 1 dermot users 2MB Dec 6 18:04 python3-wininst-3.8.6-r1.apk$ ls -l-rw-r--r-- 1 dermot users 12939721 Dec 6 18:04 python3-3.8.6-r1.apk-rw-r--r-- 1 dermot users 5207834 Dec 6 18:03 python3-dbg-3.8.6-r1.apk-rw-r--r-- 1 dermot users 24894471 Dec 6 18:03 python3-dev-3.8.6-r1.apk-rw-r--r-- 1 dermot users 13074 Dec 6 18:03 python3-doc-3.8.6-r1.apk-rw-r--r-- 1 dermot users 16930975 Dec 6 18:04 python3-tests-3.8.6-r1.apk-rw-r--r-- 1 dermot users 1024512 Dec 6 18:04 python3-wininst-3.8.6-r1.apk
and after unpacking the APK locally to get installed size:
$ du -c -BKB46502kB total
So main python3 APK file is 13MB / 46.5MB unpacked versus the current package figures of 12.59MB / 44.84MB shown here. So its not much of a size increase.
Note: I haven't tested the built packages, only built them. Also by making the "-Os" to "-O2" change and nothing else then existing "--enable-optimizations" configure flag kicked back into life:
checking for --enable-optimizations... yes
which meant that python was built twice - a full set of tests are run for profiling and then the 2nd build uses the profiling information.
However it appears the tests run can be tweaked.
In my case on a 8 core laptop the package build took approx. 20 minutes.
As a side note regarding the "-fno-semantic-interposition" CFLAG option that was added by @J0WI in !6945 (merged), I was reading Fedora's discussion from when they decided to use this and noticed:
Especially in embedded systems like IOT, CPU power is more expensive than memory. It is often the limiting factor.
I speak from 20 years of professional experience.
this issue has turned into a bit of a mess. i don't think this should be a catch-all for "optimization flags", instead each change should be made as a separate issue/MR. since i think the original issue of python -Os vs -O2/-O3 has been resolved, i'm closing this now.