Sieve pipeline unit type¶
Note
💊 Check sieve prescription pipeline unit for a higher-level abstraction.
The next pipeline unit type triggered after pseudonym type pipeline
units is called “sieve”. The
main purpose of this pipeline unit is to filter out (hence “sieve”) packages
that should not occur in the resulting stack. It’s called on each and every
package that is resolved based on direct or transitive dependencies of the
application stack supplied.
The pipeline unit of type sieve accepts a
generator of resolved package-versions (see PackageVersion abstraction in
thoth-python library) and decides which of these package versions can be
included in the resulting stack. The generator of package-versions supplied is
sorted based on Python’s version specification starting from the latest release
down to the oldest one (respecting version string, not release date). The list
will be shrinked based on limit_latest_versions (if supplied to the
adviser) after pipeline sieve runs - this option reduces the state space
considered. If sieves accept more package versions than
limit_latest_versions package versions they will be reduced to
limit_latest_versions size. Note the issues that can arise by
providing “limit latest versions” parameter, usually this
parameter is not needed.
It’s guaranteed that the list will contain package-versions in a specific (locked) version with information about the Python package index from where the given dependency came from (tripled “package name”, “locked package version” and “index url” uniquely identify any Python package, see compatibility section for additional info on Python package index specific resolution). It’s also guaranteed that the generator will contain packages of a same type (same package name).
Note
Each sieve can be run multiple times during the resolution. It can be run
multiple times even on packages of a same type based on dependency graph
resolution. An example can be package six that is a dependency of many
packages in the Python ecosystem and each package can have different version
range requirements on package six.
Main usage¶
Filter out packages, package-versions respectively, which should not occur in the resulting software stack
Returning an empty list discards all the resolved versions
Raising exception
NotAcceptablehas same effect as returning an empty list (compatibility with step pipeline unit)
Prematurely end resolution based on the sieve reached
Raising exception
EagerStopPipelinewill cause stopping the whole resolver run and causing resolver to return products computed so far
Removing a library from a stack even though it is stated as a dependency (directly or transitively) by raising
SkipPackage
Note
Even if pipeline sieves discard all the versions for a certain package, the
resolution can be still successful. An example can be discarding dependency
tensorboard from a TensorFlow stack. Dependency tensorboard is
present as a dependency only in some releases of tensorflow package.
Real world examples¶
Filter out packages like enum34 from the resolved software stack that will not install into the given software environment (enum34 is a backport of Enum to older Python releases so it will not be installed for Python3.4+, if environment markers are present and applied)
Filtering packages that have installation issues into the requested software environment - an example can be legacy Python2 packages that fail installation in Python3 environments due to syntax errors in
setup.pyFiltering packages that have runtime issues (a package installs but fails during application start - e.g. bad release)
Filter out Python packages that use Python package index that is not allowed (restricted environments)
Filter out packages that require native packages or ABI provided by a native package that are not present in the software environment used (see Thoth’s analyses of container images that are aggregated into Thoth’s knowledge base and available for Thoth’s adviser)
Filter out packages that are nightly builds or pre-releases in case of
STABLErecommendation type or disabled pre-releases configuration option inPipfileA library maintainer added enum34 package as a library dependency but did not restrict requirements to Python version with an environment marker:
enum34>=1.0; python_version < '3.4'The resolver can skip this package based on a pipeline sieve specific to the library which would raise
SkipPackageexception if theenum34would be used with newer Python version.
Triggering unit for a specific package¶
To help with scaling the recommendation engine when it comes to number of
pipeline units possibly registered, it is a good practice to state to which
package the given unit corresponds. To run the pipeline unit for a specific
package, this fact should be reflected in the pipeline unit configuration by
stating package_name configuration option. An example can be a pipeline
unit specific for TensorFlow packages, which should state package_name:
"tensorflow" in the pipeline configuration.
If the pipeline unit is generic for any package, the package_name
configuration has to default to None.
Justifications in the recommended software stacks¶
An example implementation¶
from typing import Any
from typing import Dict
from typing import Generator
from thoth.python import PackageVersion
from thoth.adviser import Sieve
class ExampleSieve(Sieve):
"""An example sieve implementation to demonstrate sieve purpose."""
CONFIGURATION_DEFAULT: Dict[str, Any] = {"package_name": None} # The pipeline unit is not specific to any package.
def run(self, package_versions: Generator[PackageVersion, None, None]) -> Generator[PackageVersion, None, None]:
for package_version in package_versions:
if self.context.project.prereleases_allowed:
_LOGGER.info(
"Project accepts pre-releases, skipping cutting pre-releases step"
)
yield package_version
if package_version.semantic_version.is_prerelease:
_LOGGER.debug(
"Removing package %s - pre-releases are disabled",
package_version.to_tuple(),
)
continue
yield package_version
The implementation can also provide other methods, such as Unit.pre_run, Unit.post_run or Unit.post_run_report and pipeline unit configuration adjustment.
See unit documentation for more info.