Introduction to Thoth’s adviser principles

In the upcoming section you can find discussion and intuition behind Thoth’s adviser logic and nomenclature used in Thoth repositories, sources and documentation.

Intuition behind Thoth’s recommendations

Let’s consider an application that has two dependencies - a package called simplelib and anotherlib. These dependencies can be installed in different versions and do not require any additional packages to be installed. Let’s have a function that scores an application based on “how good it is” when different versions of simplelib and anotherlib are used. The semantics for “how good” can be different, but one can imagine scoring based on performance, security vulnerabilities present and/or application misbehaviour (or any other scoring you can come up with).

If we create all the combinations of simplelib and anotherlib that can be produced and we score these combinations using our scoring function we get a discrete results as shown on the following figure:

A discrete state space with with results of the scoring function.

By interpolating these values we get a surface as shown on the next figure, this visualization is also more intuitive:

Interpolated discrete values in the state space forming a surface.

As can be seen, we get different score based on different versions of simplelib and anotherlib combinations. Thoth’s adviser, when computing recommendations, tries to find the highest values of the scoring function - the highest spikes present (assuming the higher number of score, the better software stack is).

Spikes present in the state space.

Thoth’s resolver can produce all the combinations of packages, considering Python ecosystem resolution, and can be guided using predictor to find high values of score faster:

Guiding resolver using predictor to resolver higher rated scores faster.

Considering real-world applications, software stacks can be formed out of multiple packages coming in different versions. Packages are introduced into a software stack based on dependency requirements of direct or indirect dependencies that can be restricted using version range specifications (making some of the combinations invalid). The shown 3D figures above show scoring function for two different packages in a software stack. This scoring function can be generalized to n-dimensional space when a software stack is made out of n-1 packages (at most) and n-th dimension is always result of the scoring function used.

Thoth’s resolver approach of resolving software stacks is to expand dependency graphs (lazily), instead of directly implementing 3SAT problem or backtracking as in case of other resolvers. See resolver documentation for more info.

Prediction-based resolution using reinforcement learning

As stated above, Thoth’s adviser does not implement 3SAT problem as many resolvers do. Instead, the whole resolution is seen as a Markov Decision Process (MDP). Intuitively, any (not necessary fully resolved) software stack can be seen as a state in an MDP and satisfies Markov property:

Any future resolution of resolver’s state depends only upon the present state, not on the sequence of events (resolution of dependencies) that preceded it.

A fully resolved software stack that satisfy all the requirements of all the dependencies present in the software stack then represents a terminal state in an MDP. Finding the best possible software stack (a terminal state with the highest score) then corresponds to solving the given MDP.

The whole design of adviser’s resolution can be seen as a framework for writing predictors for prediction based resolution.

More formally:

  • let S be a finite set of states in which the resolution can be in - partially but also fully resolved software stacks

  • let A be a finite set of actions, As be a finite state of actions from a state s

  • let Ra(s, s’) be expected immediate reward received after transitioning from state s to state s’ by taking action a following some policy π.


The state space as described and shown in the previous section corresponds to the score in a terminal state (cumulative score) - when both libraries simplelib and anotherlib are resolved and present in fully pinned down software stacks and there are no more unresolved dependencies (no more requirements introduced by resolving these two libraries).

In otherwords, the function plotted corresponds to Bellman equation for all terminal states following a policy during resolution.

Each entry in the state transition function in the MDP corresponds to a step that is performed by resolver, by taking an unresolved dependency and resolving it.

Pipeline units called steps then act like a critic that scores the given action taken (compute immediate reward signal).

For a more illustrative example, let’s suppose that the resolver is taking an action in some state sn. Let this state be described by unresolved, resolved dependencies and its current score (based on Bellman equation).

Resolved dependencies:

  • flask==1.1.1

Unresolved dependencies:

  • click==6.7

  • click==7.0

  • itsdangerous==1.1.0

  • jinja2==2.10.2

  • jinja2==2.10.3

  • werkzeug==0.15.1

Score: 0.5

An illustrative MDP described in the text.

We resolve an unresolved dependency - let’s say we take action sn a2 and resolve itsdangerous==1.1.0, we retrieve an immediate reward 0.33 and we end up in state sn + 3. This action is scored by all pipeline units of type step - a sum of their scores - note that this transition can be also invalidated by any of the step pipeline unit present in the current pipeline configuration.

The new state created is described as follows:

Resolved dependencies:

  • flask==1.1.1

  • itsdangerous==1.1.0

Unresolved dependencies:

  • click==6.7

  • click==7.0

  • jinja2==2.10.2

  • jinja2==2.10.3

  • werkzeug==0.15.1

  • + all the direct dependencies of itsdangerous==1.1.0 respecting their version range specification.

Score: 0.83

Direct dependencies of itsdangerous==1.1.0 added to unresolved dependencies are filtered based on sieve pipeline units present in the current pipeline configuration. Note that sieves can make the given transition invalid if they remove all versions for a specific package. As an example, let’s say itsdangerous==1.1.0 depends on daiquiri in versions 1.0, 2.0 or 3.0. If pipeline sieves remove all the versions of daiquiri, dependency sub-graph of itsdangerous==1.1.0 cannot be satisfied - hence the action sn a2 is invalid.


If there would be any other version of itsdangerous in the unresolved dependencies listing, it would be removed (as well as its whole dependency sub-graph) as package of type itsdangerous is already present in the current state respecting requirements.

As can be seen, the main role of sieves is to filter out invalid future actions in the upcoming resolver rounds, without considering any possible state the resolver could end up with (state independent filtering).


If the given action from a state leads to invalid transition, the predictor instance receives reward signal equal to math.nan.

If a valid transition leads to a state that has no unresolved dependencies, the given state is final (terminal state in case of MDP terminology) and it represents a fully pinned down software stack.


If the given action from a state leads to a final state (terminal state), the predictor instance receives a reward signal equal to math.inf.

The following video demonstrates the resolution process more in-depth.


In adviser docs but also in other Thoth repositories, one can find the following terms:

  • Boolean satisfiability problem - 3SAT problem

  • initial state - state of resolution in resolver that is made out of resolved direct dependencies into a concrete version coming from a Python package index

  • state - generally speaking any resolver state

  • final state - a state that has no more packages left for resolution (resolved packages form fully resolved software stack) and can become a pipeline product

  • state space - a space formed out of all the possible resolver states

  • direct dependencies - declared direct dependencies of an application (directly used in the application)

  • transitive dependencies - all the direct and indirect dependencies of an application - see transitive relation for more info

  • library usage - result of a static source code analysis done by Thoth’s Invectio which keeps track of libraries and library symbols used in the user’s source code

  • runtime environment - hardware and software environment

  • software environment - native packages, Python interpreter version and other software available when running an application (might be seen as a container image)

  • hardware environment - hardware used to run an application - for example information about CPU/GPU used

  • lockfile - a file containing all the packages resolved to a specific version - e.g. Pipfile.lock

  • project - an abstraction used to describe user’s application with direct dependencies, optional lockfile and information about hardware and software environments used

  • resolver - an abstraction that can resolve software stacks based on resolution as defined in the Python ecosystem and based on stack generation pipeline

  • predictor - an abstraction that helps resolver resolve software stacks faster by guiding during resolution - see predictor for more info

  • pipeline - in Thoth’s context, a stack resolution pipeline is used to generate and score Python software stacks for certain quality - see pipeline for more info

  • pipeline units - boot, sieve, step, stride, wrap

  • dependency monkey - one of Thoth’s components - Dependency Monkey can generate all the combinations of a software stacks and, optionally, submit them to Amun for additional verification, testing and observation aggregation

  • Bellman equation - see Bellman equation

  • Markov decision process - Markov decision process

  • Markov property - Markov property

  • Thoth - one of the ancient Egyptian deities

  • Thoth-Station - see Thoth Station

  • Amun - an executor used in Thoth to verify, install and run applications - see Amun repository for more info

  • performance indicator - a test that is performed on a part of a library to aggregate performance characteristics - see performance repo for more info

  • provenance checks - checks for provenance of installed packages - checks on their integrity and source

  • (Python) software stack - a fully pinned down (resolved) software stack made out of Python packages (direct and transitive ones) needed to run a Python application

  • Argo - workflow management used in Thoth to run workflows in an OpenShift cluster

  • adviser - one of the main components in Thoth that can resolve software stacks - see adviser repository

  • solver - one of the main components in Thoth that pre-computes information about dependencies and other metadata for Thoth’s recommendation engine - see solver repo

  • OpenShift s2i (source-to-image) - a build process defined in OpenShift for building applications - see Source-to-Image (S2I) Build

  • Jupyter Notebooks - see and also Thoth related Jupyter Notebooks with experiments

  • Thamos - a CLI for integrating with Thoth - see integration and Thamos repository on GitHub

  • pip - see pip

  • Pipenv - see Pipenv docs

  • Adaptive Simulated Annealing - see Simulated Annealing and Adaptive Simulated Annealing

  • Python triplet - a triplet made out of package name, package version (locked down) and a URL to Python package index from where the Python package came from

  • Python package index - a repository of Python packages that is compliant with PEP-503 - an example can be PyPI or AICoE index