Predictor in Thoth’s adviser¶
Two main components in Thoth’s adviser are Resolver and Predictor. This section discusses about the latter one. Predictor abstraction was introduced to guide resolver in expansion of states (performing steps until a final state is reached). This guidance can have two main purposes:
Expand states that are the most promising ones to be used by users - used for recommending software stacks in adviser
The introductory section discusses about the intuition behind Thoth’s adviser resolver that is based on two core components - Predictor and Resolver. The resolution is treated as a Markov Decision Process (MDP). See Introduction section on more info and intuition behind MDP in the resolver’s implementation.
The two main purposes above make Thoth a self-learning system.
Implementing a predictor¶
import attr from thoth.adviser import Beam from thoth.adviser import Context from thoth.adviser import Predictor from thoth.adviser import State @attr.s(slots=True) class MyPredictor(Predictor): """An example predictor implementation.""" def run(self, context: Context, beam: Beam) -> Tuple[State, str]: """Main entry-point for predictor implementation.""" state = next(beam.iter_states()) return state, next(iter(state.unresolved_dependencies))
The main method -
run - accepts two parameters -
context (adviser’s context) and a
beam. The beam is used as a pool of (not final) states
that are about to be resolved. The main goal of predictor is to return a state
present in the beam and package that should be resolved from the returned
state. The state will be expanded in the next resolver round by resolving the
returned package. The package is resolved by retrieving all the direct
dependencies of that dependency in different versions and new states are
generated out of all the combinations of packages in different versions that
can occur – if such transition is valid based Thoth’s judgement
(based on dependency specification in Python packages and based on
pseudonyms); and dependencies are accepted by pipeline
sieves and steps.
Predictor does not adjust any properties stored in the context or beam!
The state and package considered for the next resolution have to stay in the beam.
The example implementation above always expands the first state in the beam by
resolving direct dependencies of the first package stored in
State.unresolved_dependencies. Note there is no
guarantee on order of states in the beam, unless sorted states are requested.
The beam will always hold at least one state. With at least one unresolved dependency.
EagerStopPipeline will stop the resolution process.
Raising any other exception has undefined behaviour.
Another example shows expansion of a random state and iteration over all the states present in the beam:
def run(self, context: Context, beam: Beam) -> int: # Could be simplified to: # return random.randint(0, beam.size - 1) for idx, state in enumerate(beam.iter_states()): if random.choice((True, False)): return state, random.choice(list(state.unresolved_dependencies)) # Fallback to the first state. return beam.get(0)
The predictor can keep already computed results in its state, but note there is
no guarantee on index preserving and order in which states are stored in the
beam. It’s also recommended to use
Beam.iter_new_added_states to check newly added states
between predictor runs. Note the state returned is always removed from the
Order of states in the beam can change across predictor invocations. Use
id for checking identity and possible hashing of states in predictor’s
internal structures to optimize time spent in predictor.
Predictor attributes and methods¶
Predictor can accept parameters that can be supplied from CLI or directly when
instantiating predictor programmatically. If any adjustment is desired before
running the resolution pipeline, a user can implement
Predictor.pre_run method that is called with
initialized adviser context before the stack generation pipeline is triggered:
def pre_run(self, context: Context) -> None: """Implement any pre-run initialization here."""
Predictor is instantiated only once per resolver - if resolution is run
multiple times on the same resolver instance, it reuses already instantiated
pipeline units and predictor. A proper implementation of pipeline units and
resolver use the
pre_run method to initialize any internal state before
Additional methods that can be provided are:
Predictor.post_run- run after the stack generation pipeline is finished to tear down the predictor
Predictor.plot- used to plot predictor’s history
See Adaptive Simulated Annealing as an example of a predictor that samples state space and subsequently performs hill climbing as the temperature decreases.