Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Intent

Goals:

  1. ISO build process SHOULD continue to only include the packages it needs

  2. Modular packages MUST install correctly from the SIMP ISO or the local mirror created from the SIMP ISOModular

  3. On systems with modular packages installed from the SIMP ISO that are configured to use the packages' original modular repositories or complete mirrors:

    If necessary, the ISO MAY
    1. ISO-installed modular packages MUST upgrade correctly

Circumstances:

    1. ISO-installed modules and streams MUST integrate seamlessly

  1. ISO build process SHOULD continue to only include the packages it needs

Caveats:

  1. If necessary, the SIMP 6.6.0 ISO MAY mirror the entire base OS’s DVD AppStream/ repository in order to avoid red if it would avoid delays to the SIMP 6.6.0 build process.

  2. However, the ISO MUST NOT mirror an entire repository third-party modular repositories (like epel-modular) just to provide a few necessary packages from a single module stream (streams (e.g., 389-directory-server:stable)

  3. No matter what, the old createrepo command MUST NOT EVER be run on a repo with modular packages, because it will destroy the repo’s modulemd metadata and make the RPMs unable to install—use createrepo_c instead, or something like creatrerepo_mod from modulemd-tools.

Circumstances:

...

  1. The SIMP project’s RPMs/tarball MUST NOT require modularity for the SIMP 6.6.0 release (and probably never).

Constraints:

  • SIMP 6.6.0 will require RPMs from at least one module stream in the epel-modular repository: 389-directory-server:stable from the epel-modular repository

Conclusion:

  • To add modular RPMs to the SIMP ISO, we must rebuild their modular repos with only a “slim” subset of their original upstream repository’s module/streams.

  • The subsets must use the same name + stream + version + context + architecture (N:S:V:C:A)

Approach

  1. Download the modulemd metadata from the source repo of each modular RPM at the same time the RPM is acquired.

  2. Use unique N:S:V:C:A combinations from the resulting modular RPMs to determine which “slim” module streams to reconstruct. (We don’t care about /P for this.

  3. For each unique “slim” modular stream: generate modulemd metadata for all relevant RPMs

  4. Combine all “slim” modules' modulemd data into a single modulemd.yaml data structure

  5. Create the modular repository using createrepo_c (or createrepo_mod—just not createrepo

RPM data

...

Implementing slim modular repos

...

Problems that are probably solved

...

yumdownloader can’t see RPMs in modules/streams that aren’t enabled

  1. (plus) Add a field to packages.yaml to specify N:S: for each modular RPM

  2. (plus) Identify and enable all unique N: from packages.yaml (fail if there are conflicting S:)

  3. (plus) dnf module enable each N:S: before beginning to use yumdownloader

...

Individual yumdownloader runs can change repository mirrors, which may be out of sync with each other and have different modulemd data.

  1. (plus) (When using the yumdownloader) the modulemd metadata must be fetched at the same time as the RPM is downloaded, in order to preserve the precise state of that RPM’s modular metadata.

In upstream repositories, it’s possible that a single RPM could be part of multiple streams (nothing in the modulemd data prevents this). We need a way to decide which stream to use.

...

(plus) We will need to add an explicit N:S: to packages.yaml anyway, to decide which modules/streams to enable. This will explicitly set the stream.

...

However: there is no way to hint streams in the simple *pkglist.txt files for minimal BaseOS packages (unless we do something elaborate, like add comment keywords)

...

(plus) Most BaseOS EL8 modules have a default stream; use that if it exists

(plus) We can also default to the only stream.
This is hacky, but it will work for EL8.3—Base OS (i.e., AppStream) modules without a default stream are currently very rare, and at the moment all of them have a single stream:

...

  • , in order to support 389ds.

  • DNF will refuse to install an RPM with a ModularityLabel header unless it can be associated with modular metadata from an available repo.

  • Building/rebuilding repos with the old createrepo command will destroy a repo’s modular metadata and make any modular RPMs unable to install

Conclusions:

In order to build SIMP ISOs that distribute modular RPMs:

  1. The ISO build process MUST distribute modular RPMs as “slim” subset of their original upstream repository’s module/streams.

    • If feasible, the “slimming” process SHOULD be generalized enough to apply to packages from external sources like epel-modular AND the base OS’s DVD AppStream/ repository.

    • Each “slim” modular streams' subsets MUST use the same name + stream + version + context + architecture (N:S:V:C:A) as their upstream sources, in order to maintain seamless interoperability.

  2. The old createrepo command MUST NOT EVER be run on a repo with modular packages/modulemd metadata. This applies to both the ISO’s build process and post-installation local tooling on SIMP systems.

    1. Status
      colourPurple
      titleTODO
      The ISO’s modular repositories MUST be built with modulemd-aware tools, like createrepo_c or creatrerepo_mod from modulemd-tools.

  3. Once built, SIMP ISO “slim” repos' modulemd metadata MUST remain available to install their modular packages. The repos MUST NOT be mirrored or rebuilt without it:

    1. Status
      colourPurple
      titleTODO
      Local SIMP tooling that mirrors modular repositories MUST preserve modulemd metadata (e.g., dnf reposync --download-metadata )

    2. Status
      colourPurple
      titleTODO
      Local SIMP tooling MUST be changed to NEVER rebuild modular repositories with createrepo. This includes:

A “slim” subset for a reoi

enables the ISO to distribute installable epel-modular packages without distributing epel-modular's entire collection of modules x streams x packages.


Creating a modular repository with “slim” stream mirrors

Overview

  1. At the same time each modular RPM is acquired, save its source repo’s modulemd metadata.

  2. Use unique N:S:V:C:A combinations from the resulting modular RPMs to determine which “slim” module streams to reconstruct.

  3. For each unique “slim” modular stream: generate modulemd metadata for all relevant RPMs

  4. Combine all “slim” modules' modulemd data into a single data structure and write it to modules.yaml

  5. Rebuild the modular repository using createrepo_c (or createrepo_mod) with the new modules.yaml file

Implementing slim modular repos

...

Modular RPM data/metadata to get/record/cache

At a minimum, a new field (only required for modular RPMs) that specifies the N:S (module:stream) for modular packages should be added to the build’s packages.yaml.

...

Problems that are probably solved

yumdownloader can’t see RPMs in modules/streams that aren’t enabled

  1. Status
    colourPurple
    titleTODO
    Add an optional field to packages.yaml entries to specify the N:S: for each modular RPM

  2. Status
    colourPurple
    titleTODO
    Identify and enable all unique N: from packages.yaml (fail if there are conflicting S:)

  3. Status
    colourPurple
    titleTODO
    dnf module enable each N:S: before beginning to use yumdownloader

  4. Status
    colourPurple
    titleTODO
    Individual yumdownloader runs can change repository mirrors, which may be out of sync with each other and have different modulemd data.

    1. Status
      colourPurple
      titleTODO
      Status
      colourRed
      titleunsolved?
      (When using the yumdownloader) the modulemd metadata must be fetched at the same time as the RPM is downloaded, in order to preserve the precise state of that RPM’s modular metadata.

A single RPM could be part of multiple streams in an upstream repository

Nothing in the modulemd data prevents this, so we need a way to determine the correct stream.

Status
colourPurple
titleTODO
This isn’t a problem for External packages, because We already need to add a field to explicitly set N:S: to packages.yaml.

However: there is no way to hint streams in *pkglist.txt files for minimal BaseOS packages (unless we do something elaborate, like add comment keywords and a parser)

  1. Status
    colourPurple
    titleTODO
    Most BaseOS EL8 modules have a default stream; use that if it exists

  2. Status
    colourPurple
    titleTODO
    We could also default to the only stream.
    This is hacky, but it will work for EL8.3—Base OS (i.e., AppStream) modules without a default stream are currently very rare, and at the moment all of them have a single stream:

    Code Block
    # dnf module --disablerepo=\* --enablerepo=appstream list | grep -v '\[d\]'
    CentOS Linux 8 - AppStream
    Name                 Stream       Profiles       
    libselinux-python
        
    2.8
              
    common
                Summary                       
    Python
     
    2
     
    bindings
     
    for
     
    libselinux
                                         
    389-ds               1.4                                                   389 Directory Server (base)                                                                                
    mod_auth_openidc
         
    2.3
                                                       
    Apache
     
    module
     
    suporting
     
    OpenID
     
    Connect
     
    authentication
                                                      
    libselinux-python    2.8          common                                   Python 2 bindings for libselinux                                                              
    parfait
                  
    0.5
              
    common
                                       
    Parfait
     
    Module
                                                                    
    mod_auth_openidc     2.3                                                   Apache module suporting OpenID Connect authentication                                                                               
    pki-core
                 
    10.6
                                                      
    PKI
     
    Core
     
    module
     
    for PKI 10.6 or later
                        
    parfait              0.5          common                                   Parfait Module                                                                                                                                                
    pki-deps
                 
    10.6
                               
    PKI Dependencies module for PKI 10.6 or later
  3. (question) This leaves a rare edge case (current population: 0) that will fail where Base OS modules with multiple streams don’t have a default stream.

    • We should probably have a way of formally declaring N:S for *pkglist.txt Base OS RPMs in the future. Some possibilities:

      • A separate *pkglist.modularity.txt file?

      • N:S-declaring directives in the comments of *pkglist.txt?

      • Could this be combined with packages.yaml? (not easy to see how)

Unsolved problems

  1. 🎗 What are the specific flow differences in “Fetch RPM” between Base OS (AppStream) and External yumdownloader (epel-modular)?

  2. (minus)(minus)With yumdownloader, how can we get the source repo’s modulemd data for each RPM?

    1. Option 1: see if yumdownloader can be convinced to display it, like --urls (haven’t found it yet)

    2. Option 2: walk up the dir tree until we find metadata (hacky, expensive)

    3. Option 3: find/define the DNF cache and somehow fish out the modulemd data for that specific package

  3. (minus) With yumdownloader, Different RPMs could be sourced from different versions (V:) of the same module stream if yumdownloader pulls them from different repo mirrors that are out of sync with each other. Using the heuristic of a “slim” module stream per unique N:S:V:C:A , this would result in multiple module streams instead of one.

    (question) This is a rare edge case that V: is specifically intended to catch. We should probably fail and refuse to proceed with the build using RPMs from multiple different versions of the same module stream; there’s a small chance that not all the RPMs are still intended to be part of the module.
                        
    pki-core             10.6                                                  PKI Core module for PKI 10.6 or later                                                                                                                                                                                     
    pki-deps             10.6                                                  PKI Dependencies module for PKI 10.6 or later                          
  4. (question)

    Status
    colourBlue
    titleNOT IN 6.6.0
    Status
    colourRed
    titleunsolved
    This leaves open the future edge-case where we include RPMs that will fail when we need an RPM from one of the Base OS modules above that have multiple streams but don’t have a default stream (current population: 0).

    • Status
      colourBlue
      titleNOT IN 6.6.0
      We should probably have a way of formally declaring N:S for *pkglist.txt Base OS RPMs in the future. Some possibilities:

      • A separate *pkglist.modularity.txt file

      • N:S-declaring directives in the comments of *pkglist.txt

      • Could this be combined with packages.yaml? (not without a major rewrite)

Unsolved problems

What are the “Fetch RPM” flow differences between Base OS (prune_packages) & External (yumdownloader) packages?

(warning)(minus) When running yumdownloader, how can we get the RPM’s source repo’s modulemd data?

This is easy to do by hand for an individual package, but I’m not sure how to automate it yet. Here are some options:

  1. Option 1: see if yumdownloader can be convinced to display it, like --urls (haven’t found it yet)

  2. Option 2: walk up the dir tree until we find metadata (hacky, expensive)

  3. Option 3: find/define the DNF cache and somehow fish out the modulemd data for that specific package

(minus) Separate yumdownloader runs may result in RPMs for the same N:S having different N:S:V:C:A

Different RPMs could be sourced from different versions (V:) of the same module stream if yumdownloader pulls them from different repo mirrors that are out of sync with each other. Using the heuristic of a “slim” module stream per unique N:S:V:C:A , this would result in multiple module streams instead of one.

(question) This is a rare edge case that V: is specifically intended to catch, and it seems correct to fail instead of rebuild a modular stream using RPMs from a different (stream) versions. However, I can’t really demonstrate that the potential impact of this is worth prioritizing its implementation.

The strongest impacts I came up with so far rely on the fact that there’s a good chance that, between two stream versions, the combined set of RPMs won’t match either stream exactly. But unless your mirrors were really out of sync, this probably wouldn’t matter much. The stream version is a snapshot in time of all the modulemd metadata for the stream; it doesn’t actually affect the RPM’s resolution.

I honestly don’t know many details of how/when V: is used other than “highest wins”, but it might lead to weird edge cases.

  1. There’s a (staggeringly) remote change that the newer stream version dropped package(s) or one of its packages has a new dependency

  2. The slim repo will use one N:S:V or the other, but neither upstream precisely matches its RPMs. After re-integrating with the full upstream repo or mirror, DNF might miss an update by deciding it already know the stream version resolve using the wrong stream version for some of the packages, to the wrong versions, miss updates

  3. There may be other reasons to do with inter-modular dependencies.

  4. TL;DR: Not sure if failing is the best

    way—input welcome.

Unknown Unknowns

  1. Should/how would we way forward—input welcome.

Are there conditions where streams don’t provide C:A information when packages are noarch?

No. By the time they are built, they will have a context and arch.

Undecided

[Should/how to] persist cached modulemd metadata for already-downloaded RPMs between builds?

The current yumdownloader process

True or false: “Any mirrored “slim” module MUST NOT have multiple streams”

  1. (question) This sounds reasonable, but is it actually true?

  2. It’s impossible to install multiple streams on a single SIMP server, but do we think we’d need to package multiple “slim” module streams for agents?

    1. My current inclination is to assume “no.” Given our approach toward modularity in general, that seems like a really edgy edge case.