thoth.build_analyzers package


thoth.build_analyzers.analysis module

Build log analysis logic.

thoth.build_analyzers.analysis.build_breaker_analyze(log: str, *, colorize: bool = True) → Tuple[str, pandas.core.frame.DataFrame][source]

Analyze raw build log.

thoth.build_analyzers.analysis.build_breaker_format_report(report: dict, indentation_level: int = 4) → str[source]

Format the report produced by the build_breaker_report function into string.

thoth.build_analyzers.analysis.build_breaker_identify(dep_table: pandas.core.frame.DataFrame, error_messages: List[str]) → Optional[str][source]

Identify build breaker package name.

thoth.build_analyzers.analysis.build_breaker_predict(log_messages: Iterable[str], patterns: Iterable[str], reverse_scores: bool = False) → numpy.ndarray[source]

Predict scores and candidate pattern indices for each log message in log.

The method compares each message in log with a candidate pattern in patterns and outputs similarity score based on a BoW approach penalized by the length of the log message.

  • logs – Iterable[str], An iterable of log messages

  • patterns – Iterable[str], patterns to compare to log messages


np.ndarray of shape (2, n), n is length of logs

dimensions represent message similarity score and candidate pattern index respectively

thoth.build_analyzers.analysis.build_breaker_report(log: Union[str, pandas.core.frame.DataFrame], *, handler: str = None, top: int = 5, colorize: bool = False) → dict[source]

Analyze raw build log and produce a report.

  • log – Union[str, pd.DataFrame], raw build log to be analyzed or result of build_log_analyze

  • handler

    str, handler to be used, only required if log is result of an analysis

    Currently supported handlers are: pip and pipenv

  • top – int, maximum number of candidates to report

  • colorize – bool, whether to map scores to colors (only valid if log is instance of str)


dict of the following schema: {

”build_breaker”: {

“already_satisfied”: bool, “source”: str, “target”: str, “version_installed”: str, “version_specified”: str

}, “reason”: {

”ln”: str, “msg” : str

}, “candidates”: List[dict#reason]


thoth.build_analyzers.analysis.get_failed_branch(dep_table: pandas.core.frame.DataFrame, build_breaker: str)[source]

Traverse dependency table in DFS manner and output installed packages.

thoth.build_analyzers.analysis.get_succesfully_installed_packages(dep_table: pandas.core.frame.DataFrame, build_breaker: str = None)[source]

Traverse dependency table in DFS manner and output installed packages.

thoth.build_analyzers.analysis.retrieve_build_log_patterns(log_messages: List[str]) → Tuple[str, pandas.core.series.Series][source]

Retrieve build log patterns based on the given log file.

This function detects whether the log file has been produced by ‘pip’ or ‘pipenv’ and retrieves appropriate resources.

thoth.build_analyzers.analysis.simple_bow_similarity(matcher: str, matchee: str) → Tuple[float, List[str]][source]

Compare two sentences and count number of common words.


float, score representing sentence similarity

thoth.build_analyzers.analysis.simple_bow_similarity_with_replacement(matcher: str, matchee: str, reformat=False) → Tuple[float, List[str]][source]

Compare two strings while respecting matcher string formatting syntax.

This function checks for string formatted syntax in the matcher pattern and replaces it with regexp based syntax. Then size of the span is computed and transformed into similarity score.


float, score representing sentence similarity

thoth.build_analyzers.cli module

Command line interface for Thoth build-analyzers library.

class thoth.build_analyzers.cli.AliasedGroup(name=None, commands=None, **attrs)[source]

Bases: click.core.Group

Command group to handler comand aliases.

get_command(ctx, cmd_name)[source]

Get Click command by its name.

thoth.build_analyzers.preprocessing module

Build log preprocessing and feature gathering.

thoth.build_analyzers.preprocessing.ast_search_expressions(entrypoint: Union[str, pathlib.PosixPath], expressions: Union[str, List[str]] = None, glob: str = '**/*.py', verbose: bool = False)[source]

Glob through the source AST and extract AST elements and patterns.

thoth.build_analyzers.preprocessing.ast_search_pip(entrypoint: str)[source]

Search through the source AST and extract patterns for pip.

thoth.build_analyzers.preprocessing.ast_search_pipenv(entrypoint: str)[source]

Search through the source AST and extract patterns for pipenv.

thoth.build_analyzers.preprocessing.ast_to_pattern_dataframe(elements: list, patterns: List[str]) → pandas.core.frame.DataFrame[source]

Convert AST matches into a pattern dataframe.


elements – list of AST elements corresponding to the patterns

thoth.build_analyzers.preprocessing.build_log_prepare(log: str) → List[str][source]

Process raw build log by lines and output list of log messages.


List[str], list of log messages

thoth.build_analyzers.preprocessing.build_log_to_dependency_table(log: str, handlers: List[str] = None) → pandas.core.frame.DataFrame[source]

Parse raw build log to find software stack and create dependency table.

thoth.build_analyzers.preprocessing.clean_pattern_dataframe(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Clean the DataFrame by removing unwanted patterns and reformatting.

thoth.build_analyzers.preprocessing.reconstruct_string(format_pattern: str, format_string: str) → str[source]

Attempt to reconstruct string based on a format pattern.

thoth.build_analyzers.preprocessing.reformat(string: str) → str[source]

Reformat format codes by PEP 461 and PEP 3101 to formatting style defined by parse library.

Module contents

Build analysis library and tools to handle and process build logs.