Alpine Linux has gained popularity as a Linux distribution that is especially good for Docker images.
What’s one of the biggest benefits of Docker? Reproducibility, including deterministic, reproducible Dockerfile builds.
You should be able to make a Dockerfile deterministic by pinning package version numbers, so that your image is not dependent on the point in time when it was built.
Unfortunately, the Alpine package repo drops packages, even packages on "stable" branches.
Example: on 2020 March 10th, I found gcc 9.2.0-r3 on the Alpine package repository (web UI) under branch 3.11. On 2020 March 23rd, just 13 days later, my Dockerfile failed to run because the package gcc 9.2.0-r3 had been revoked from the branch 3.11 of the package repository, and was replaced with gcc 9.2.0-r4.
This makes Alpine Linux unsuitable for use in Docker images. Either your Dockerfile with pinning will "expire", or you are forced to avoid pinning package versions, which may cause unexpected behavior. When package maintainers decide to release a new version, this unexpected version will be automatically installed as soon as you rebuild your image the next time.
Compare this to PyPI or npm: No version is dropped, so version pinning works perfectly fine, no matter when you build or use your stuff.
There is a similar thread, apk-tools#10661 (closed) , which Timo Teräs (@fabled) closed, based on the unconfirmed assumption that the OP was mixing an Alpine image with Alpine packages from 2 different branches.
However, in my example, both the Alpine version and the package version were on the 3.11 branch. There is no mixing.
EDIT:
It seems that the official recommendation for how to ensure your Dockerfile is deterministic is to keep your own mirror / repository with all the specific packages and their versions that you need to keep indefinitely.
Alternatively, you can push/pull your Docker images to/from a binary repo, such as Artifactory. Then, the only time you ever have to run your Dockerfile is when the Dockerfile and Docker image are being modified anyway, which makes a deterministic Dockerfile moot.
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Not sure what to say here. You seem to have different expectation what the current policy is. We don't at the moment have resources to store all built packages indefinitely in our infra. Thus we currently keep only the latest for each stable branch, and has always been like that.
PyPi and npm just keep source versions. We have to do binary builds which are considerably larger, and may need to be redone when one of the dependencies changes. So as whole this is a magnitude more difficult and storage hungry problem. Of course it is unfortunate that the same rules applies to all packages, even to the python/npm ones that would not need to be rebuilt as often as they are source distributions.
There has been discussion of keep all packages tagged as Alpine in the future. However, this is still "in-progress". The official recommendation is to keep your own mirror / repository with all the specific package and their versions that you may want to use.
Moving to abuild which is where the package deletion currently happens.
Wouldn't the apk indexer ( don't know what is the client name ) have the same ability to search for specific versions for every branch
and at the same time one could use the combined index to seach all packages' versions at once
I think that would save space rather than taking it
as you won't need to repeat any version if it's used among different branches
you just need to list it in the index targetting that branch version
Also I have listed packages ($pkg) in three ways above
I would suggest the first one as it will separate them and that pkg name before apk extension would help determine any possibly wrong placed package file
Also name could contain the branch before apk extension
I think this suggesion would make it possible to support many versions
and just delete it when no index has it for certin time
The only drawback I could think about is that the old apk client would't understand the new arch
however that is also could be avioded by rewriting old pathes by your server to the new one and either serve versioned, latested or full index as you wish
Maybe this is also applicable with repositories and just flag these packges inside it, However that may not be the way you think developing alpine
but it's of course your policy, it's all up to you and those are just suggesions
I know I'm 4 years late, but I just found this issue now
hope you tell me what you think about my suggesion
This will not work because the builds between branches are completely different and you should not mix branches. Each branch has often very different package contents even if the package version looks like its the same.
Each branch has separate builders with different compiler chain (binutils, gcc version etc.). On many packages the dependencies are formed build time. That is, when a main package is built with different version of library, the end result is different. This even the same exact package from edge often is incompatible with stable branches if the one of the dependencies was upgraded.
Alpine has also a strict policy of recompiling everything when stable branch is created. This ensures that we have working bootstrap, and that everything is compiled with the same and stable compiler/toolchain.
Because of the above, there would be no space reduction even if trying to combine the trees.
@fabled, thank you for looking into this. I understand that there is a space limitation issue, but I'm glad that it is still being looked at.
Is there any way to apply a new policy just to python/npm packages that are just source distributions?
Additionally, perhaps Alpine can simply provide one or more "LTS" branches that do not drop packages? This will still require storage space for binary builds, but not nearly so much as all built packages would require.
You said that the official recommendation is to one's own mirror/repo. Where is this recommendation? Is it in the documentation somewhere?
I know that there are organisation that have their own mirrors, where they don't delete the old packages. They only sync the one (or two) architectures they use.
We tried to set up a zfs based solution for storing old snapshots for this purpose but we relatively quickly ran out of disk space. We have talked about maybe do snapshots at release tagging time, but we have not had the manpower to follow that up. (in addition to a get a couple of servers with big disk storage).
The official recommendation is to keep your own mirror / repository with all the specific package and their versions that you may want to use.
Is there a guide on how to do this?
Most of us face this error when a version we relied on disappears. Even if we find this recommendation then, I suppose we can't mirror the removed dependencies after-the-fact so we're left with the effort of upgrading or struggling to install old versions in some other way.
We don't at the moment have resources to store all built packages indefinitely in our infra.
Is this an effort problem (e.g. improving infra) or a money problem (e.g. storage / bandwidth)? I personally would be happy to pledge money towards the goal of keeping multiple versions of certain packages and I'm probably not alone. E.g. Node.js is one of the main candidates for pinning specific versions. Just the sheer number of SO / GH issues about this is daunting.
The official recommendation is to keep your own mirror / repository with all the specific package and their versions that you may want to use.
Is there a guide on how to do this?
What we ended up doing, instead of keeping a repo of Alpine packages, we keep a repo of built Docker images on Artifactory. This way, the only time we ever have to run our Dockerfile is when the Dockerfile and Docker image are being modified anyway, which makes a deterministic Dockerfile moot.
This can be used to pin an Arch Linux install to a certain date, thus achieving reproducibility. (Currently this requires disabling package signature verification, because package signatures are not timestamped and the keys used to sign old packages expire; but Arch doesn't sign the repositories anyway, so for this use case this is moot.)
Very old packages get moved to an archive.org collection, and redirects are left in place.
Perhaps doing the same would be feasible for Alpine?
Not sure what to say here. You seem to have different expectation what the current policy is. We don't at the moment have resources to store all built packages indefinitely in our infra. Thus we currently keep only the latest for each stable branch, and has always been like that.
By the way, just monthly or even yearly snapshots would be very helpful already. At least for most use cases I've seen, you don't need some exactly specific package version or the very latest package versions, but rather any recent-ish package versions along with some guarantee that they will remain available in a reproducible way for the foreseeable future.
Thus we currently keep only the latest for each stable branch
This means that the stable branches are still a moving target and are not reproducible, as fixes still get applied to them. My and I think the OP's goal are complete reproducibility, excluding any changes. Does that sound correct?
I see 3.2 from your example is eight years old and likely to be too old to be useful for many applications.
Looking at https://www.alpinelinux.org/releases/ it's not immediately clear to me which branches are frozen and which not, and I'm guessing there is no guarantee attached so the support window may change in the future.
For what reason? What's the goal of doing that?
There are many possible reasons to want reproducibility. A container image that can be built in a reproducible manner provides guarantees for processes which require long-term reliability. Storing the built image is always an option, but makes modifications difficult, and is less flexible than building it as needed. I don't think I can convincingly explain all possible applications here, so I suggest looking at the motivations for projects which have reproducibility as a primary goal, such as Nix OS and Reproducible Builds.
One specific goal that I am working towards right now is visual regression testing of websites using browser screenshots. The main complication is packages of complex software which do not have LTS branches and ship security fixes along with other changes, Chromium being one. Font rendering depends on a lot of components and is generally quite fragile as far as reproducibility is concerned, so pinning any single package version is not sufficient. For this goal, I think a reproducible Linux distribution which provides a guarantee that a certain version will remain installable for the foreseeable future would be an ideal solution.
the tags don't mean anything in light of everything above. the entire repository is just effectively a branch. you can visualise it a git checkout, that just updates on commits, with the folders being repositories inside the same branch (release).
all the tags are for is to generate .iso's/docker images/etc in response to e.g. a cve in the base image, just a refresh, ... they're not related to what packages you actually get in the release repository.
closing as this is neither an build issue itself (sorry!), nor something anywhere on any "roadmap" (we have enough infrastructure-related issues as it is, and this is not an easy thing to provide). this feature just does not exist.