Predictor in Thoth’s adviser¶
Two main components in Thoth’s adviser are Resolver and Predictor. This section discusses about the latter one. Predictor abstraction was introduced to guide resolver in expansion of states (performing steps until a final state is reached). This guidance can have two main purposes:
Expand states that are the most promising ones to be used by users - used for recommending software stacks in adviser
Expand states for which Thoth has no observation about - used for filling Thoth’s knowledge base using Dependency Monkey and Amun
Note
The introductory section discusses about the intuition behind Thoth’s adviser resolver that is based on two core components - Predictor and Resolver. The resolution is treated as a Markov Decision Process (MDP). See Introduction section on more info and intuition behind MDP in the resolver’s implementation.
The two main purposes above make Thoth a self-learning system.
Implementing a predictor¶
To implement a predictor, you need to derive from Predictor
class and implement at least the
run
method:
import attr
from thoth.adviser import Beam
from thoth.adviser import Context
from thoth.adviser import Predictor
from thoth.adviser import State
@attr.s(slots=True)
class MyPredictor(Predictor):
"""An example predictor implementation."""
def run(self, context: Context, beam: Beam) -> Tuple[State, str]:
"""Main entry-point for predictor implementation."""
state = next(beam.iter_states())
return state, next(iter(state.unresolved_dependencies))
The main method - run
- accepts two parameters - context
(adviser’s context) and a beam
. The beam is used as a pool of (not final) states
that are about to be resolved. The main goal of predictor is to return a state
present in the beam and package that should be resolved from the returned
state. The state will be expanded in the next resolver round by resolving the
returned package. The package is resolved by retrieving all the direct
dependencies of that dependency in different versions and new states are
generated out of all the combinations of packages in different versions that
can occur – if such transition is valid based Thoth’s judgement
(based on dependency specification in Python packages and based on
pseudonyms); and dependencies are accepted by pipeline
sieves and steps.
Warning
Predictor does not adjust any properties stored in the context or beam!
The state and package considered for the next resolution have to stay in the beam.
The example implementation above always expands the first state in the beam by
resolving direct dependencies of the first package stored in
State.unresolved_dependencies
. Note there is no
guarantee on order of states in the beam, unless sorted states are requested.
The beam will always hold at least one state. With at least one unresolved dependency.
Note
Raising exception EagerStopPipeline
will stop the resolution process.
Raising any other exception has undefined behaviour.
Another example shows expansion of a random state and iteration over all the states present in the beam:
def run(self, context: Context, beam: Beam) -> int:
# Could be simplified to:
# return random.randint(0, beam.size - 1)
for idx, state in enumerate(beam.iter_states()):
if random.choice((True, False)):
return state, random.choice(list(state.unresolved_dependencies))
# Fallback to the first state.
return beam.get(0)
The predictor can keep already computed results in its state, but note there is
no guarantee on index preserving and order in which states are stored in the
beam. It’s also recommended to use Beam.iter_new_added_states
to check newly added states
between predictor runs. Note the state returned is always removed from the
beam.
Note
Order of states in the beam can change across predictor invocations. Use
id
for checking identity and possible hashing of states in predictor’s
internal structures to optimize time spent in predictor.
Predictor attributes and methods¶
Predictor can accept parameters that can be supplied from CLI or directly when
instantiating predictor programmatically. If any adjustment is desired before
running the resolution pipeline, a user can implement Predictor.pre_run
method that is called with
initialized adviser context before the stack generation pipeline is triggered:
def pre_run(self, context: Context) -> None:
"""Implement any pre-run initialization here."""
Predictor is instantiated only once per resolver - if resolution is run
multiple times on the same resolver instance, it reuses already instantiated
pipeline units and predictor. A proper implementation of pipeline units and
resolver use the pre_run
method to initialize any internal state before
resolution.
Additional methods that can be provided are:
Predictor.post_run
- run after the stack generation pipeline is finished to tear down the predictorPredictor.post_run_report
- run after the stack generation pipeline is finished and report is constructed as per user request (see resolver for more info)Predictor.plot
- used to plot predictor’s history
See Adaptive Simulated Annealing as an example of a predictor that samples state space and subsequently performs hill climbing as the temperature decreases.