Table of Contents |
---|
Intent
Goals:
ISO build process SHOULD continue to only include the packages it needs
Modular packages MUST install correctly from the SIMP ISO or the local mirror created from the SIMP ISO
Modular packages installed from the SIMP ISO MUST upgrade correctly
Circumstances:
If necessary, the ISO MAY mirror the entire base OS’s DVD
AppStream/
repository in order to avoid redHowever, the ISO MUST NOT mirror an entire repository (like
epel-modular
) just to provide a few packages from a single module stream (389-directory-server:stable
)No matter what, the old
createrepo
command MUST NOT EVER be run on a repo with modular packages, because it will destroy the repo’s modulemd metadata and make the RPMs unable to install—usecreaterepo_c
instead, or something likecreatrerepo_mod
from modulemd-tools.
Circumstances:
The introduction of 389ds in 6.6.0 will require the
389-directory-server:stable
from theepel-modular
repository
Conclusion:
To add modular RPMs to the SIMP ISO, we must rebuild their modular repos with only a “slim” subset of their original upstream repository’s module/streams.
The subsets must use the same name + stream + version + context + architecture (N:S:V:C:A)
Approach
...
Creating “slim” stream mirrors of modular RPM packages enables the SIMP ISO to support modularity while still mix/matching specific RPMs from various sources. For instance, it permits adding a few epel-modular
packages without distributing epel-modular
's entire collection of modules x streams x packages.
See https://simp-project.atlassian.net/wiki/pages/resumedraft.action?draftId=2193326084 for a summary of motivating challenges and requirements .
Process
The “slim” mirroring process must happen during/before the ISO build process.
At the same time each modular RPM is acquired, save its source repo’s modulemd metadata.
Use unique N:S:V:C:A combinations from the resulting modular RPMs to determine which “slim” module streams to reconstruct. (We don’t care about /P for this.
For each unique “slim” modular stream: generate modulemd metadata for all relevant RPMs
Combine all “slim” modules' modulemd data into a single modulemddata structure and write it to
modules.yaml
data structureCreate Rebuild the modular repository using
createrepo_c
(orcreaterepo_mod
—just notcreaterepo
RPM data
...
Implementing slim modular repos
...
) with the new
modules.yaml
file
Implementing slim modular repos
...
Note |
---|
repomd.xml XML root is namespaced; causes XPath trouble
|
Modular RPM data/metadata to get/record/cache
At a minimum, a new field (only required for modular RPMs) that specifies the N:S (module:stream) for modular packages should be added to the build’s packages.yaml
.
...
Problems that are probably solved
yumdownloader
can’t see RPMs in modules/streams that aren’t enabled
...
Add an optional field toStatus colour Purple title TODO packages.yaml
entries to specify the N:S: for each modular RPM
Identify and enable all unique N: fromStatus colour Purple title TODO packages.yaml
(fail if there are conflicting S:)Status colour Purple title TODO dnf module enable
each N:S: before beginning to useyumdownloader
IndividualStatus colour Purple title TODO yumdownloader
runs can change repository mirrors, which may be out of sync with each other and have different modulemd data.Status colour Purple title TODO
(When using theStatus colour Red title unsolved? yumdownloader
) the modulemd metadata must be fetched at the same time as the RPM is downloaded, in order to preserve the precise state of that RPM’s modular metadata.
A single RPM could be part of multiple streams
...
in an upstream repository
Nothing in the modulemd data prevents this
...
, so we need a way to
...
determine the correct stream.
Status | ||||
---|---|---|---|---|
|
packages.yaml
...
.
...
However: there is no way to hint streams in
...
*pkglist.txt files for minimal BaseOS packages (unless we do something elaborate, like add comment keywords and a parser)
Most BaseOS EL8 modules have a default stream; use thatStatus colour Purple title TODO the default stream if it exists
We can also
If there is only a single stream, default to the onlystream.Status colour Purple title TODO
This is hacky, but it will work for EL8.3—Base OS (i.e., AppStream) modules without a default stream are currently very rare, and at the moment all of them have a single stream:Code Block # dnf module --disablerepo=\* --enablerepo=appstream list | grep -v '\[d\]' CentOS Linux 8 - AppStream Name Stream
Profiles
Summary 389-ds 1.4
389 Directory Server (base) libselinux-python 2.8
common
Python 2 bindings for libselinux mod_auth_openidc 2.3
Apache module suporting OpenID Connect authentication parfait 0.5
common
Parfait Module pki-core 10.6
PKI Core module for PKI 10.6 or later pki-deps 10.6
PKI Dependencies module for PKI 10.6 or later
Status colour Blue title NOT IN 6.6.0
This leaves open a rareStatus colour Red title unsolved potential edge-case: if in the future, we require an RPM from a Base OS modules without a default stream but ships with multiple streams (again, current population: 0)
that will fail where Base OS modules with multiple streams don’t have a default stream., it will fail and there is no way to hint
We should probably have a way of formally declaring N:S forStatus colour Blue title NOT IN 6.6.0 *pkglist.txt
Base OS RPMs in the future. Some possibilities:A separate
*pkglist.modularity.txt
file
N:S-declaring directives in the comments of
*pkglist.txt
Could this be combined with
packages.yaml
? (not
without a major rewrite)
Unsolved problems
...
What are the
...
“Fetch RPM” flow differences
...
between Base OS (
...
prune_packages
)
...
& External (yumdownloader
...
) packages?
https://simp-project.atlassian.net/browse/SIMP-9643
...
How can we
...
know the URL/path to an RPM’s source repo’s repomd.xml file?
https://simp-project.atlassian.net/browse/SIMP-9644
This is simple enough to do by hand for an individual package, but I’m not sure how to automate it yet. Here are some ideas:
itOption 1: see ifyumdownloader
can be convinced to displaythe repo root’s URL, like
--urls
does with the RPM(I haven’t found
an option that does this)
Option 2: walk up the dir tree until we find metadata(hacky, expensive)
Option 3: (somehow) find/define the DNF cache that was used to download the RPM and (somehow) fish out the modulemd data that was used for that specific package
With
yumdownloader
,
Option 4: Do everything the other way around:Status colour Green title Current favorite Before getting packages, get each representative repo’s repomd.xml file first, then use it to find the xxx-packages.yaml.gz
read the modulemd data from the packages.yaml file
filter the modulemd data down to just the streams and packages you need
then run
yumdownloader
to acquire those exact packages
Separate yumdownloader
runs may result in RPMs for the same N:S having different N:S:V:C:A
Different RPMs could be sourced from different versions (V:) of the same module stream if yumdownloader
pulls them from different repo mirrors that are out of sync with each other. Using the heuristic of a “slim” module stream per unique N:S:V:C:A , this would result in multiple module streams instead of one.
...
This is a rare edge case that V: is specifically intended to catch
...
, and it seems correct to fail instead of building a “mirrored” stream subset using RPMs from
...
a different (stream) versions. However, I can’t demonstrate that the potential impact of
...
this scenario is worth prioritizing its implementation.
The strongest impacts I came up with so far rely on the fact that there’s a good chance that, between two stream versions, the combined set of downloaded RPMs won’t be a precise subset of either stream. But unless the mirrors were really out of sync, this probably wouldn’t matter. The stream version is a snapshot in time of all the modulemd (modular) metadata for the stream—it doesn’t actually affect its RPMs' resolutions.
(I honestly don’t know many details of how/when V: is used other than “highest wins”, but it might lead to weird edge cases:)
There’s a (staggeringly) remote change that the newer stream version dropped package(s) or one of its packages has a new dependency
The slim repo will use one N:S:V or the other, but neither upstream precisely matches its RPMs. After re-integrating with the full upstream repo or mirror, DNF might miss an update by deciding it already know the stream version resolve using the wrong stream version for some of the packages, to the wrong versions, miss updates
There may be other reasons to do with inter-modular dependencies.
TL;DR: Not sure if failing is the best
way—input welcome.
Unknown Unknowns
Should/how would we way forward—input welcome.
For the time being, I am treating this as a don’t-have-to-solve problem
Are there conditions where streams don’t provide C:A information when packages are noarch?
No. By the time they are built, they will have a context and arch.
Undecided
[Should/how to] persist cached modulemd metadata for already-downloaded RPMs between builds?
The current yumdownloader
process
True or false: “Any mirrored “slim” module MUST NOT have multiple streams”
This sounds reasonable, but is it actually true?
It’s impossible to install multiple streams on a single SIMP server, but do we think we’d need to package multiple “slim” module streams for agents?
My current inclination is to assume “no.” Given our approach toward modularity in general, that seems like a really edgy edge case.