Step prescription pipeline unit

Declaring pipeline units of type step.

Note

The pipeline unit is registered based on should_include directive - see documentation for more info.

The following example shows all the configuration options that can be applied for a step pipeline unit type. See respective sections described below for more info. Also note, the example shows all the options that can be supplied and is not semantically valid (not all options can be supplied at the same time):

name: StepUnit
type: step
should_include:
  # See should_include section for more info.
match:                                            # Criteria to trigger run of this pipeline unit.
  package_version:                                # Matching criteria for a package that is about to be included to the resolver's state.
    name: flask                                   # Optional name of the package for which the unit should be registered. If name is not provided, the unit is run for any package matching also other directives.
    version: '>1.0,<=1.1.0'                       # Version specifier for which the unit should be run. If not provided, defaults to any version.
    index_url: 'https://pypi.org/simple'          # Package source index for which the unit should be run. If not provided, defaults to any index. Can be negated using "not".
    develop: false                                # If specified, match development or not development dependencies.
  state:                                          # Optional, resolver internal state to match for the given resolution step.
    resolved_dependencies:
    - name: werkzeug                              # Dependencies that have to be present in the resolved state. The semantics for each entry is same as for package_version directive.
      version: '==1.0.0'
      index_url: 'https://pypi.org/simple'
      develop: false
run:
  score: 0.42                                     # Optional score assigned to the step performed in the resolution.
  justification:
  - type: INFO
    message: Hello, Thoth!
    link: 'https://thoth-station.ninja'

  not_acceptable: Bad package inclusion           # Block including certain package during the resolution.

  # Configuration of prematurely terminating the resolution process.
  eager_stop_pipeline: Stop pipeline

  multi_package_resolution: false                 # Run this pipeline multiple times when matched mutliple times. Defaults to false if not provided.

  log:                                            # Optional text printed to logs when the unit gets called.
    message: Some text printed to log on pipeline unit run
    type: WARNING

  stack_info:                                     # Information printed to the recommended stack report.
  - type: WARNING
    message: Hello, world!
    link: 'https://thoth-station.ninja'           # A link to justifications or a link to a web page.

Step match

Match the given step performed in the resolution process. A step is described by state stating required resolved dependencies so far and package that is about to be resolved (added to the resolved dependencies listing):

  • package_version - package that is about to be resolved by adding it to the resolver’s state

    • name - optional, name of the package

    • version - optional, version in a form of version specifier

    • index_url - optional, Python package index URL, can be negated using not

    • develop - optional, if provided it additionally specifies if the dependency should or should not be a development dependency

  • state - internal resolver’s state with resolved dependencies

A state that needs to be met to trigger the given step pipeline. The state can state:

  • resolved_dependencies - optional listing of resolved dependencies:

    • name - optional package name that has to be stated in the resolved dependency listing

    • version - optional package version in a form of version specifier that has to be stated in the resolved dependency listing

    • index_url - optional package index from which the given package is consumed, can be negated using not

    • develop - optional boolean stating whether the package is or is not in the development dependency listing

  • package_version_from where each entry describes packages that introduced the matched package. Each entry can state directives as in package_version. If multiple entries are stated, each entry has to introduce matched package_version as a dependency.

To run the given step, all the packages in the resolved dependency listing need to be present in the resolved software stack. Also both state and package_version need to be matched.

It is possible to provide a listing of matching criteria to run the given pipeline unit multiple times - the run part will be reused for each match entry stated.

Note

Example:

# Match when torch (not dev) in a 1.9.0 compatible release from PyPI is
# about to be included into resolver's state with torchvision==0.9.0 from PyPI.
match:
  package_version:
    name: torch
    version: "~=1.9.0"
    index_url: "https://pypi.org/simple"
    develop: false
  state:
    resolved_dependencies:
    - name: torchvision
      version: "==0.9.0"
      index_url: "https://pypi.org/simple"
# Match when torch in a 1.9.0 compatible release *not* from PyPI is about to
# be included into resolver's state with torchvision==0.9.0 *not* from PyPI.
match:
  package_version:
    name: torch
    version: "~=1.9.0"
    index_url:
      not: "https://pypi.org/simple"
  state:
    resolved_dependencies:
    - name: torchvision
      version: "==0.9.0"
      index_url:
        not: "https://pypi.org/simple"
match:
# Match when resolving tensorflow as a dependency of seldon and
# flask is already in the resolved dependency listing:
- package_version:
    name: tensorflow
  state:
  - resolved_dependencies:
      name: flask
    package_version_from:
      name: seldon
# Or match when resolving tensorflow as a dependency of seldon and
# connexion is already in the resolved dependency listing:
- package_version:
    name: tensorflow
  state:
  - resolved_dependencies:
      name: connexion
    package_version_from:
      name: seldon

Step run.log

Print the given message to the resolution log if the pipeline unit is included and run.

See boot’s log that has shared semantics.

Step run.stack_info

See stack info which semantics is shared with this unit.

Note the stack info is added only once even if the pipeline unit is run multiple times during the resolution process.

Step run.multi_package_resolution

A boolean stating whether the given unit should be run if criteria match multiple times per resolution run. Defaults to false.

See multi package resolution flag in steps.

Step run.justification

Optional justification added to the resolved stack when the pipeline unit is run. This justification is added only if no not_acceptable and no eager_stop_pipeline are supplied - if the given step is a valid step in the resolution process. See Adding justifications to the recommended software for more info on how to write justifications and their semantics.

Each entry in the list is specified by three attributes:

  • type - any of INFO, WARNING, and ERROR specifying severity of the produced justification

  • message - a message in a free text form printed to users

  • link - a link to a document describing more information in detail

The link can be in a form of a valid HTTP or HTTPS URL or a string which references justifications available at thoth-station.ninja/justifications.

Note

Example:

name: StepUnit
type: step
should_include:
  times: 1
  adviser_pipeline: true
match:
  package_version:
    index_url: 'https://thoth-station.ninja/simple'
run:
  score: +0.1
  justification:
  - type: INFO
    message: Builds available on index thoth-station.ninja/simple take precedence
    link: 'https://thoth-station.ninja/'

Step run.score

Optional score to penalize or prioritize resolving the given stack. Score has to be from interval -1.0 to +1.0 inclusively. This score corresponds to the reward signal.

Step run.not_acceptable

Make the given step not acceptable in the resolution process. This option is suitable to avoiding resolution of certain combination of packages - resolver will try to find another resolution path to satisfy requirements.

Note

Example:

A pipeline unit that filters out any tensorflow~=2.4.0 when numpy==1.19.1 is in already resolved dependencies.

name: StepUnit
type: step
should_include:
  adviser_pipeline: true
match:
  package_version:
    name: numpy
    version: "==1.19.1"
    index_url: 'https://pypi.org/simple'
  state:
    resolved_dependencies:
    - name: tensorflow
      version: '~=2.4.0'
run:
  multi_package_resolution: true
  not_acceptable: "NumPy==1.19.5 is causing issues when used with TensorFlow 2.4"
  stack_info:
  - type: WARNING
    message: "NumPy==1.19.5 is causing issues when used with TensorFlow 2.4"
    link: "https://thoth-station.ninja/j/tf_24_np.html"

Step run.eager_stop_pipeline

If the given pipeline unit is registered and matched, it will cause the whole resolution process to halt and report back results computed, if any. If no results are available, the resolution process will fail as no software stack is produced.

Group step prescription pipeline unit

A group step prescribes a step unit in the resolution process that can match based on dependencies that would be present in the resolved software stack together. Unlike step, group step is agnostic to order in which dependencies are resolved in the resolution process.

name: TensorFlow26Keras27Sieve
type: step.Group
should_include:
  adviser_pipeline: true
  # See should_include section for more options.
match:
# Run this prescription if keras>=2.7.0 is resolved together with tensorflow>2.6.0,<2.7.0
#   OR
# if keras>=2.7.0 is resolved together with tensorflow-cpu>2.6.0,<2.7.0
- group:
  - package_version:
      name: keras
      version: ">=2.7.0"
  - package_version:
      name: tensorflow
      version: ">=2.6.0,<2.7.0"
- group:
  - package_version:
      name: keras
      version: ">=2.7.0"
  - package_version:
      name: tensorflow-cpu
      version: ">=2.6.0,<2.7.0"
run:
  stack_info:
  - type: WARNING
    message: >-
      TensorFlow in versions >=2.6.0,<2.7.0 overpinned Keras dependency, compatible releases are ~=2.7.0
    link: https://github.com/tensorflow/tensorflow/issues/52922
  not_acceptable: >-
    TensorFlow in versions >=2.6.0,<2.7.0 overpinned Keras dependency, compatible releases are ~=2.7.0

GroupStep match.group[*].package_version

Each entry in match’ group requires two or more Python packages that need be present together in the resolution process. Each package_version is specified using:

  • name - required, name of the package

  • version - optional, version in a form of version specifier

  • index_url - optional, Python package index URL, can be negated using not

  • develop - optional, if provided it additionally specifies if the dependency should or should not be a development dependency

GroupStep run

Group step’s run entry can specify the following directives:

  • stack_info

  • not_acceptable

  • score

  • justification

  • eager_stop_pipeline

  • log

The logic is shared with step. Mind missing multi_package_resolution which configuration is implicit for this prescription.

SkipPackage step prescription pipeline unit

This pipeline unit skips including the given package in the resolved stack considering also state of the resolver. If the state is not relevant for skipping the matched package, use SkipPackage sieve instead.

Running this pipeline unit will make sure that the matched package_version and all its dependencies will be removed from the dependency graph. In other words, sub-graph introduced by the matched package_version will be completely removed.

The pipeline unit considers what packages introduced the package that is supposed to be removed and optionally other packages that are already present in the resolved dependencies listing. Use this unit if you wish to exclude accidentally included dependencies.

Note

Example:

A pipeline unit that removes SciPy package from the stack if SciPy was introduced by the given TensorFlow version.

name: SkipPackageStepUnit
type: step.SkipPackage
should_include:
  adviser_pipeline: true
match:
  package_version:
    name: scipy
  state:
    package_version_from:
    - name: tensorflow
      version: '>=2.1,<=2.3'
      index_url: https://pypi.org/simple
      develop: false
run:
  stack_info:
  - type: WARNING
    message: TensorFlow in versions >=2.1<=2.3 stated SciPy as a dependency but it is not used in the codebase
    link: 'https://github.com/tensorflow/tensorflow/issues/35709'

The described pipeline unit shares most of the directives with the step prescription pipeline unit. However, it does not allow declaring:

  • run.not_acceptable

  • run.score

  • run.justification

  • run.eager_stop_pipeline

AddPackage step prescription pipeline unit

This pipeline unit allows adding packages to the dependency graph even though they were not stated in requirements. The unit is suitable for fixing underpinning issues.

Note

Example:

A pipeline unit that adds pandas package to the stack if SciPy was introduced by the given TensorFlow version and Matplotlib is already resolved.

name: AddPackageStepUnit
type: step.AddPackage
should_include:
  adviser_pipeline: true
  # See should_include section for more options.
match:
  package_version:
    name: scipy
    version: '~1.7.1'
    index_url: 'https://pypi.org/simple'
  state:
    package_version_from:
    - name: tensorflow
      version: '>=2.1,<=2.3'
      index_url: https://pypi.org/simple
      develop: false
    resolved_dependencies:
    - name: matplotlib
run:
  stack_info:
  - type: INFO
    message: Injecting Pandas to the dependency graph
    link: 'https://thoth-station.ninja'
  log:
    type: INFO
    message: Injecting Pandas to the dependency graph
  package_version:
    name: pandas
    locked_version: ==1.3.2
    index_url: 'https://pypi.org/simple'
    develop: false

AddPackageStep match

See Step match that has shared semantics.

AddPackageStep run.stack_info

See boot’s stack info which semantics is shared with this unit.

Note the stack info is added only once even if the pipeline unit is run multiple times during the resolution process.

AddPackageStep run.log

Print the given message to the resolution log if the pipeline unit is included and run.

See boot’s log that has shared semantics.

AddPackageStep run.package_version

Specification of a package that should be added to the dependency graph. All the fields are mandatory:

  • name - name of the package

  • locked_version - locked package version (must start with ==)

  • index_url - Python package index URL from where the package is supposed to be installed

  • develop - add the given package to default or development dependencies of the project

Note the given package in the specified version has to be already analyzed by the system so that resolver can inject this dependency and possibly all its dependencies into the dependency graph. The Python package index has to be also know and enabled on the deployment side. If these conditions are not met, the pipeline unit will not register the requested package to the dependency graph.