I run a very simple benchmark script link with count increased by 100 to get larger values. On the same system running CentOS 7 and Alpine 3.16 as containers with both having php 8.0.26 I got 28.478 sec for CentOS and 39.605 sec for Alpine 3.16 ( the actual run time is not important - the values are for an older system - on a modern Xeon based system it takes about 17 sec. - the important value is the ratio between the run times - in this case Alpine was close to 40% slower ).
I rerun the tests a number of time and the numbers stay in the same range - this is a very large difference - my speculation is that either the CentOS gcc produces better code or some optimization flags were not set when compiling php on Alpine. Hope somebody will be able to reproduce my results and the maintainer of php will be able to take some actions if possible at all.
@bluesky please provide more details about enable php extensions and which specific tests brings this variation (math, string, loops, if-else) from benchmark script
in 3.16 PHP using-O2 (changed from -Os) so curious which flags are used in CentOS
The differences are in Math ( 7.505 sec vs 10.923 sec ) and String ( 18.200 sec vs 26.360 sec ), the others are on par. phpinfo() on Centos does not give the compiler options used ( as it does with Alpine ) - the only thing I have there is gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3) used to compile it ( comes from Remi's RPM repository ).
Here is an interesting site I found that has been running the same type of test for many versions of PHP - does not help with the issue here, but good info to have - link
Thank you, math in PHP using assembly so should not depend on compiler (otoh 8 vs 12 could affect), strings not clear but surely could be caused by musl vs glibc
I checked with edge 8.1 and 8.2 (using clang) and see not much difference in numbers so probably external dependencies can have effect too
I have run few more tests - on a newer system with Deb11, Rocky9, Ubuntu 22.04, Alpine 3.16 and few other containers ( all containers are based on lxc code from http://images.linuxcontainers.org ). I installed php 8.0 and 8.1 and in all cases Alpine was slower by about 30-40% - in the case of glibc based systems the numbers were very close ( to each other and between 8.0 and 8.1 ), as we would expect running on the same hardware.
In the end unless a benchmark is very comprehensive, it is not a true measure of performance - I have seen benchmarks of Wordpress where PHP 8.x is making a good difference - what I have not seen is a comparison of Wordpress running on Debian, CentOS, Ubuntu or another based on glibc vs an Alpine based Wordpress install on the same hardware.
One suggestion I would have is to increase the test run times ( by at least factor of 10 ). Even if you have a system with many cores and it can be dedicated there are still things happening in the background - for example on Intel it has hyper threading which itself would have overheads - I know it is a pain to wait for the numbers, but by having longer runs you will get values that are more representative, and have a better chance to factor out other variables. The simplest test is to run your test a number of times - given that the computations are deterministic your runs should all be about the same. That said I have not looked at the PHP microtime() function used to time the tests, so maybe this is not a factor.
I am happy that you have been able to confirm my findings - using your numbers looks like Alpine is 48% slower for the overall test - would be interesting to take @ncopa counts and multiply by the times from Alpine and Deb. and compare the two, and then be able to compare that to the requests/sec numbers - as I mentioned before there are many of those posted but I have not seen once that compare OS to OS - in most cases it is PHP version to version for many different applications. Good luck.
@andypost isn't musl also weak on Python execution? What is the exact reason for that? Could it be related? Afaik the compilation time of the kernel is also slow due to the lack of compiler optimizations in BusyBox. I've read about PHP's JIT got a lot of improvements on version 8 which could further cause these issues.
Afaik the compilation time of the kernel is also slow due to the lack of compiler optimizations in BusyBox.
i'm not sure how those two are related at all. do you mean busybox awk being slow (noticeably for x86-only kernel builds)? gawk/mawk don't have that issue
Unfortunately, I don't know. I just learned about kernel compilation and Python being way slower from YouTube reviews. PHP has "improved" their JIT portion on v8, so it could be related, but who am I?
I'm not deep enough into this topic, but I want to make use of PHP. Maybe I will setup a benchmark before and after installing glibc and the coreutils to see if there are major improvements or if there are issues in general.
(Disclaimer: I don't work for the PHPF and I'm not a maintainer, I only contribute as a hobby and I don't speak for the maintainers)
For the metaphone function I was able to optimize it because there were a lot of redundant libc calls. Removing this redundancy improved the performance for both glibc-based and musl-based systems, and makes the performance gap between musl and glibc smaller (although there will likely still be a small gap).
I looked at optimizing some of the other functions listed here. The string functions very often make calls to memcpy & memchr. Under glibc these are accelerated using SSE, AVX, or other specialised instructions which make them very fast. The musl implementation does not do that and is therefore quite a bit slower. I can clearly see on Alpine that for example for str_replace: 18% of the time is spent in memchr and 22% is spent in memcpy.
I just did a simple experiment by changing the memcpy in str_replace to an inline assembly routine using "rep movsb" (specialised instruction which is very fast on my CPU because my CPU is newer than Ivy Bridge).
On Alpine 32-bit I got an average of 0.354s with my changes versus 0.591s without changes. A big speedup.
In theory PHP could provide certain optimized memory functions on some system configurations in order to achieve this kind of speedup. However, I think that implementing these accelerations into musl would be more beneficial as more applications can benefit from this.
Also note that while using "rep movsb" instead of musl's memcpy was a lot faster, using "rep mobsb" instead of memset on my glibc-based distro was quite a bit slower. If we would end up making these kind of changes to PHP, we'd have to take the platform into consideration.
this most likely wouldn't be implemented (going by feeling), though i sadly don't have a link to mailing thread handy for the rationale.
aside from that, there isn't something specifically bottlenecked here. applications don't exactly spend all their runtime in str_replace and so on, though across the board there is then still some percentile improvement when you add everything up.
which leads to the real issue: that there is no issue. this is not a "i ran php on alpine, and it got an expected runtime of until the heat death of the universe" or "it consumed all my cpu for 6 hours" or "php deadlocked doing xyz" or anything else. it's "i ran some completely synthetic benchmarks not representative of any real-world application, that call a function in a loop, and the functions are X% (for small values of X, nothing by some huge 10x factors) slower than glibc". that is to say, there is nothing to fix here, this is not a bug report or a feature request, or.. anything. none of the numbers are even particularly shocking or interesting, except perhaps str_replace itself indeed (or for).
if you want to "improve synthetic benchmark performance compared to glibc when calling musl functions", then you can report that and/or implement it in musl itself, on the mailing list: https://www.openwall.com/lists/musl/ . (or put in other words- there is nothing alpine as a distro can do here).
if you think there is something miscompiled (e.g. completely wrong configuration, etc) for php itself here in alpine (that causes some specific, actionable, issues), then open an issue for that, and perhaps it can be worked on.
@psykose I respect your decision to close this, because I also believe this is a fundamental issue on musl and not Alpine, however I think Alpine still can do something about it.
He then tested malloc-ng that will be part of musl eventually, which improves, but only so little. But then he compiled and used mimalloc together with musl and now his benchmarks perform better than with glib.
I'd welcome to see mimalloc as a part of the main Alpine repository as it would give a lot of people easy options to improve performance noticeably.