Stride pipeline unit type

Note

💊 Check stride prescription pipeline unit for a higher-level abstraction.

Once there are no more unresolved dependencies in the resolver state (no more steps to be performed), such state becomes a “final state” (see Introduction to Thoth’s adviser principles for theoretical background). Pipeline units called “strides” are then called on the given final state to check whether the given final state should be accepted and treated as one of the pipeline products.

Main usage

  • Decide whether the given fully resolved software stack should be accepted and be treated as one of the pipeline products

    • Raising exception NotAcceptable will prevent from adding fully resolved (final) state to the pipeline products

  • Prematurely end resolution based on the the final state reached

    • Raising exception EagerStopPipeline will cause stopping the whole resolver run and causing resolver to return products computed so far

Real world examples

  • Filter out software stacks with same score in recommendations - most probably they include package-versions that do not differentiate resolved software stack quality

  • To test TensorFlow with different versions of numpy in Dependency Monkey a stride implementation can prevent from generating stacks that have same numpy version

  • Do not accept stacks for Dependency Monkey runs for which Thoth has observation already (for example performance related testing using performance indicators)

  • Randomly sample state space and run Amun inspection jobs to gather observations about Python ecosystem and packages present

Triggering unit for a specific package

To help with scaling the recommendation engine when it comes to number of pipeline units possibly registered, it is a good practice to state to which package the given unit corresponds. To run the pipeline unit for a specific package, this fact should be reflected in the pipeline unit configuration by stating package_name configuration option. An example can be a pipeline unit specific for TensorFlow packages, which should state package_name: "tensorflow" in the pipeline configuration.

If the pipeline unit is generic for any package, the package_name configuration has to default to None.

An example implementation

from typing import Dict
from typing import List
from typing import Optional
from typing import Tuple
import random

from thoth.adviser.exceptions import NotAcceptable
from thoth.adviser import Stride
from thoth.python import PackageVersion


class StrideExample(Stride):
    """Flip a coin, heads discard the given state."""

    CONFIGURATION_DEFAULT: Dict[str, Any] = {"package_name": None}  # The pipeline unit is not specific to any package.

    def run(self, state: State) -> None:
        """The main entry-point for stride implementation demonstration."""
        if bool(random.getrandbits(1)):
            raise NotAcceptable(
                f"State with score {state.score!r} was randomly discarded by flipping a coin"
            )

The implementation can also provide other methods, such as Unit.pre_run, Unit.post_run or Unit.post_run_report and pipeline unit configuration adjustment. See unit documentation for more info.