thoth.lab package

Submodules

thoth.lab.adviser module

Adviser results processing and analysis.

thoth.lab.adviser.aggregate_adviser_results(adviser_version: str, limit_results: bool = False, max_ids: int = 5) → pandas.core.frame.DataFrame[source]

Aggregate adviser results from jsons stored in Ceph.

Parameters
  • adviser_version – minimum adviser version considered for the analysis of adviser runs

  • limit_results – reduce the number of inspection batch ids considered to max_ids to test analysis

  • max_ids – maximum number of inspection batch ids considered

thoth.lab.adviser.create_adviser_heatmap(adviser_justification_df: pandas.core.frame.DataFrame, save_result: bool = False, output_dir: str = '')[source]

Create adviser justifications heatmap plot.

Parameters
  • adviser_justification_df – data frame as returned by `create_final_dataframe’ per identifier.

  • save_result – resulting plots created are stored in output_dir.

  • output_dir – output directory where plots are stored if save_results is set to True.

thoth.lab.adviser.create_adviser_results_histogram(plot_df: pandas.core.frame.DataFrame)[source]

Create inspection performance parameters plot in 3D.

:param plot_df dataframe for plot of adviser results

thoth.lab.adviser.create_final_dataframe(adviser_dataframe: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Create final dataframe with all information required for plots.

Parameters

adviser_dataframe – data frame as returned by aggregate_adviser_results method.

thoth.lab.adviser.extract_adviser_justifications(report: Dict[str, Any], adviser_dict: Dict[str, Any], ids: str) → Dict[str, Any][source]

Retrieve justifications from adviser results.

thoth.lab.adviser.extract_justifications_from_products(products: List[Dict[str, Any]], adviser_dict: Dict[str, Any], ids: str) → Dict[str, Any][source]

Extract justifications from products in adviser results.

thoth.lab.convert module

Utilities to work with package dependencies.

thoth.lab.dependency_monkey module

Dependency Monkey results processing and analysis.

thoth.lab.dependency_monkey.aggregate_dm_results_per_identifier(identifiers_inspection: List[str], limit_results: bool = False, max_batch_identifiers_ids: int = 5) → Union[dict, List[str]][source]

Aggregate inspection batch ids and specification from dm documents stored in Ceph.

Parameters
  • inspection_identifier – list of identifier/s to filter inspection batch ids

  • limit_results – limit inspection batch ids considered to max_batch_identifiers_ids to test analysis

  • max_batch_identifiers_ids – maximum number of inspection batch ids considered

thoth.lab.exception module

Exceptions for thoth-lab methods.

exception thoth.lab.exception.NotUniqueValues[source]

Bases: Exception

An exception when dateframe unique method cannot return results.

thoth.lab.graph module

Various helpers and utils for interaction with the graph database.

class thoth.lab.graph.DependencyGraph(incoming_graph_data=None, **attr)[source]

Bases: networkx.classes.ordered.OrderedDiGraph

Construct a dependency graph by extending nx.OrderedDiGraph.

adjlist_dict_factory

alias of collections.OrderedDict

static get_root(tree)[source]

Return root of the current graph, if any.

By default, tree topology is considered as input, so if there are multiple roots, only the first one is returned.

node_dict_factory

alias of collections.OrderedDict

class thoth.lab.graph.GraphQueryResult(result)[source]

Bases: object

Wrap results of graph database queries.

plot_bar()[source]

Plot histogram of results obtained.

plot_pie()[source]

Plot a pie of results into Jupyter notebook.

serialize()[source]

Serialize the output of graph query.

to_dataframe()[source]

Construct a panda’s dataframe on results.

thoth.lab.graph.get_root(tree)

Return root of the current graph, if any.

By default, tree topology is considered as input, so if there are multiple roots, only the first one is returned.

thoth.lab.inspection module

Inspection results processing and analysis.

thoth.lab.inspection.aggregate_inspection_results_per_identifier(inspection_ids: List[str], identifier_inspection: List[str], inspection_batch_data: Dict[str, dict]) → dict[source]

Aggregate inspection results per identifier from inspection documents stored in Ceph.

Parameters
  • inspection_ids – list of inspection ids

  • identifier_inspection – list of identifier/s to filter inspection ids

  • inspection_batch_data – info to be added to each inspection (e.g. specification)

thoth.lab.inspection.columns_to_analyze(df: pandas.core.frame.DataFrame, low: int = 0, display_clusters: bool = False, cluster_by_hue: bool = False) → pandas.core.frame.DataFrame[source]

Print all columns within dataframe and count of unique column values within limit.

Parameters
  • df – data frame to analyze as returned by `process_inspection_results’

  • low – the lower limit (0 if not specified) of distinct value counts

  • display_clusters – if true, displays grouped counts of parameter and parameter sort_values

  • cluster_by_hue – if true, displays distribution of parameters to analyze sorted by hues

thoth.lab.inspection.concatenated_df(dfs: List[pandas.core.frame.DataFrame], column: str)[source]

Reorganize dataframe to show the distribution of jobs in a category across different subsets of data.

Parameters
  • dfs – list of inspection result dataframes which can be different datasets or subset of datasets

  • column – column name or category for grouping to see the distribution of results

thoth.lab.inspection.create_duration_box(data: pandas.core.frame.DataFrame, columns: Union[str, List[str]] = None, **kwargs)[source]

Create duration Box plot.

thoth.lab.inspection.create_duration_dataframe(inspection_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Compute statistics and duration DataFrame.

thoth.lab.inspection.create_duration_histogram(data: pandas.core.frame.DataFrame, columns: Union[str, List[str]] = None, bins: int = None, **kwargs)[source]

Create duration Histogram plot.

thoth.lab.inspection.create_duration_scatter(data: pandas.core.frame.DataFrame, columns: Union[str, List[str]] = None, **kwargs)[source]

Create duration Scatter plot.

thoth.lab.inspection.create_duration_scatter_with_bounds(data: pandas.core.frame.DataFrame, col: str, index: Union[list, pandas.core.indexes.base.Index] = None, **kwargs)[source]

Create duration Scatter plot with upper and lower bounds.

thoth.lab.inspection.create_filtered_df(df: pandas.core.frame.DataFrame, pi_name: Optional[str] = None, pi_component: Optional[str] = None, runtime_environment: Optional[str] = None, packages: Optional[List[Tuple[str, str, str]]] = None) → pandas.core.frame.DataFrame[source]

Create dataframe using the filters selected for plots.

thoth.lab.inspection.create_final_dataframe(packages_versions: dict, python_packages_dataframe: pandas.core.frame.DataFrame, inspection_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Create final dataframe with all information required for plots.

Parameters
  • packages_versions – dict as returned by create_python_package_df method.

  • python_packages_dataframe – data frame as returned by create_python_package_df method.

  • inspection_df – data frame containing data of inspections results.

thoth.lab.inspection.create_inspection_2d_plot(plot_df: pandas.core.frame.DataFrame, quantity: str, identifiers_inspections: List[str])[source]

Create inspection performance parameters plot in 3D.

:param plot_df dataframe for plot of inspections results

thoth.lab.inspection.create_inspection_3d_plot(plot_df: pandas.core.frame.DataFrame, quantity: str, identifiers_inspections: List[str])[source]

Create inspection performance parameters plot in 3D.

:param plot_df dataframe for plot of inspections results

thoth.lab.inspection.create_inspection_analysis_plots(inspection_df: pandas.core.frame.DataFrame)[source]

Create inspection analysis plots for the inspection pd.Dataframe.

Parameters

inspection_df – data frame as returned by `process_inspection_results’ for a specific inspection identifier

thoth.lab.inspection.create_inspection_dataframes(inspection_results_dict: dict, duration_info: bool = False) → dict[source]

Create dictionary with data frame as returned by `process_inspection_results’ for each inspection identifier.

Parameters

inspection_results_dict – dictionary containing inspection results per inspection identifier.

thoth.lab.inspection.create_inspection_parameters_dataframes(parameters: List[str], inspection_df_dict: dict) → Dict[str, pandas.core.frame.DataFrame][source]

Create pd.DataFrame of selected parameters from inspections results to be used for statistics and error analysis.

It also outputs batches and parameters map that is necessary for plots.

Parameters
  • parameters – inspection parameters used in the analysis

  • inspection_df_dict – dictionary with data frame as returned by `process_inspection_results’ per identifier.

thoth.lab.inspection.create_inspection_time_dataframe()[source]

Create pd.Dataframe of time of inspections for build and job.

thoth.lab.inspection.create_multiple_violin_plot(data: pandas.core.frame.DataFrame, quantity: str, x_label: str = '', y_label: str = '', save_result: bool = False, project_folder: str = '', folder_name: str = '', linewidth: int = 1)[source]

Create violin plot.

thoth.lab.inspection.create_plot_from_df(data: pandas.core.frame.DataFrame, columns: Union[str, List[str]] = None, title_plot: str = ' ', x_label: str = ' ', y_label: str = ' ', static: str = True, save_result: bool = False, project_folder: str = '', folder_name: str = '', scatter: bool = False)[source]

Create plot using two columns of the DataFrame.

thoth.lab.inspection.create_plot_multiple_batches(data: pandas.core.frame.DataFrame, quantity: str, plot_type: str = 'box', x_label: str = '', y_label: str = '', static: str = True, save_result: bool = False, project_folder: str = '', folder_name: str = '')[source]

Create (Histogram or Box) plot using several columns of the dataframe(static as default).

thoth.lab.inspection.create_python_package_df(inspection_df: pandas.core.frame.DataFrame) → Union[pandas.core.frame.DataFrame, dict][source]

Create DataFrame with only python packages present in software stacks.

thoth.lab.inspection.create_scatter_and_correlation(data: pandas.core.frame.DataFrame, columns: Union[str, List[str]] = None, title_scatter: str = 'Scatter plot')[source]

Create Scatter plot and evaluate correlation coefficients.

thoth.lab.inspection.create_scatter_plots_for_multiple_batches(inspection_df_dict: Dict[str, pandas.core.frame.DataFrame], list_batches: List[str], columns: Union[str, List[str]] = None, title_scatter: str = ' ', x_label: str = ' ', y_label: str = ' ')[source]

Create Scatter plots for multiple batches.

Parameters
  • inspection_df_dict – dictionary with data frame as returned by `process_inspection_results’ per identifier

  • list_batches – list of batches to be used for correlation analysis

  • columns – parameters to be considered, taken from data frame as returned by `process_inspection_results’

  • title_scatter – scatter plot name

  • x_label – x label name

  • y_label – y label name

thoth.lab.inspection.dataframe_statistics(inspection_df: pandas.core.frame.DataFrame, plot_title: str)[source]

Output a data frame with relevant statistics on job duration, build duration and time elapsed.

Parameters
  • inspection_df – data frame to analyze as returned by `process_inspection_results’ (duration [ms])

  • plot_title – title of fit plot

thoth.lab.inspection.display_jobs_by_subcategories(df: pandas.core.frame.DataFrame)[source]

Create dataframe with job counts for each subcategory for every column in the data frame.

Parameters

df – dataframe with columns of unique value counts as returned by columns_to_analyze

thoth.lab.inspection.duration_plots(df: pandas.core.frame.DataFrame)[source]

Create plots for job and build duration, elapsed time, and lead time.

Parameters

df – data frame with duration information as returned by process_inspection_results

thoth.lab.inspection.evaluate_inspection_statistics(parameters: list, inspection_df_dict: dict) → dict[source]

Aggregate statistical quantities per inspection parameter for inspection batches.

Parameters
  • parameters – inspection parameters used in the analysis

  • inspection_df_dict – dictionary with data frame as returned by `process_inspection_results’ per identifier

thoth.lab.inspection.evaluate_statistics(inspection_df: pandas.core.frame.DataFrame, inspection_parameter: str) → Dict[source]

Evaluate statistical quantities of a specific parameter of inspection results.

thoth.lab.inspection.evaluate_statistics_on_inspection_df(df: pandas.core.frame.DataFrame, column_names: List[str]) → pandas.core.frame.DataFrame[source]

Evaluate statistics on performance values selected from Dataframe columns.

thoth.lab.inspection.extract_keys_from_dataframe(df: pandas.core.frame.DataFrame, key: str)[source]

Filter the specific dataframe created for a certain key, combination of keys or for a tree depth.

thoth.lab.inspection.extract_specification(inspection_batch_result: Dict[str, Any], inspection_id: str)[source]

Extract specification info for the inspection.

thoth.lab.inspection.extract_structure_json(input_json: dict, upper_key: str, depth: int, json_structure)[source]

Convert a json file structure into a list with rows showing tree depths, keys and values.

Parameters
  • input_json – inspection result json taken from Ceph

  • upper_key – key starting point to recursively traverse all tree

  • depth – depth in the tree

  • json_structure – recurrent list to store results while traversing the tree

thoth.lab.inspection.filter_df(df, *args)[source]

Filter Dataframe.

thoth.lab.inspection.filter_document_ids(inspection_store, inspection_identifiers: List[str]) → Dict[str, List][source]

Filter inspection document ids list according to the inspection identifiers selected.

Parameters

inspection_identifiers – list of identifier/s to filter inspection ids

thoth.lab.inspection.filter_inspection_ids(inspection_identifiers: List[str]) → dict[source]

Filter inspection ids list according to the inspection identifier selected.

Parameters

inspection_identifiers – list of identifier/s to filter inspection ids

thoth.lab.inspection.make_subplots(data: pandas.core.frame.DataFrame, columns: List[str] = None, *, kind: str = 'box', **kwargs)[source]

Make subplots and arrange them in an optimized grid layout.

thoth.lab.inspection.map_column_to_feature_class(column_name: str)[source]

Helper function that maps a column in the original dataframe to a feature class.

Parameters

column_name – column_name in inspection_df dataframe

obtained by process_inspection_results with no columns dropped (drop=False)

thoth.lab.inspection.plot_distribution_of_jobs_combined_categories(df_hardware_category: pandas.core.frame.DataFrame, df_duration: pandas.core.frame.DataFrame, df_analyze: pandas.core.frame.DataFrame)[source]

Plot the job duration distribution for each unique hardware combination/configuration of data.

Parameters
  • df_hardware_category – dataframe of of parameters to analyze grouped by distinct rows

  • df_duration – dataframe with duration information as returned by process_inspection_results

  • df_analyze – dataframe of parameters that show variation across the clusters

thoth.lab.inspection.plot_interpolated_statistics_of_inspection_parameters(statistical_results_dict: dict, identifier_inspection_list: dict, inspection_parameters: List[str], colour_list: List[str], statistical_quantities: List[str], title_plot: str = ' ', title_xlabel: str = ' ', title_ylabel: str = ' ', save_result: bool = False, project_folder: str = '', folder_name: str = '')[source]

Plot interpolated statistical quantity/ies of inspection parameter/s from different inspection batches.

thoth.lab.inspection.plot_subcategories_by_hues(df_cat: pandas.core.frame.DataFrame, df: pandas.core.frame.DataFrame, column)[source]

Create scatter plots with parameter categories separated by hues.

Parameters
  • df_cat – filtered dataframe with columns to analyze as returned by columns_to_analyze

  • df – data frame with duration information as returned by process_inspection_results

  • colum – job duration/build duration columns from ‘df’

thoth.lab.inspection.process_empty_or_mutable_parameters(inspection_df: pandas.core.frame.DataFrame)[source]

Process empty or mutable parameters in dataframe.

These values will not work with further processing using the groupby function. Prints the unique value count of all columns that are unhashable (all such columns are constant). Drops these columns and returns a new dataframe.

Parameters

inspection_df – data frame as returned by process_inspection_results

with no columns dropped (drop=False)

thoth.lab.inspection.process_inspection_results(inspection_results: List[dict], exclude: Union[list, set] = None, apply: List[Tuple] = None, drop: bool = True, verbose: bool = False, duration_info: bool = False) → pandas.core.frame.DataFrame[source]

Process inspection result into pd.DataFrame.

thoth.lab.inspection.query_inspection_dataframe(inspection_df: pandas.core.frame.DataFrame, *args, **kwargs) → pandas.core.frame.DataFrame[source]

Wrapper around _.query method which always include duration columns in filter expression.

thoth.lab.inspection.show_categories(inspection_df: pandas.core.frame.DataFrame)[source]

List categories in the given inspection pd.DataFrame.

thoth.lab.inspection.show_inspection_inputs(filtered_inspection_ids: List[str], inspection_batch_ids: List[str], filtered_inspection_batch_ids: List[str])[source]

Show inspections inputs for the analysis.

Parameters
  • filtered_inspection_ids – list of inspection ids after filtering

  • inspection_batch_ids – list of inspection batch ids

  • filtered_inspection_batch_ids – llist of inspection batch ids after filtering

thoth.lab.inspection.show_unique_value_count_by_feature_class(processed_df: pandas.core.frame.DataFrame)[source]

Show unique count values per feature/class.

Show results per feature/class that are subdivided in subclasses that map to it.

Parameters

processed_df – processed dataframe as returned by the process_empty_or_mutable_parameters

thoth.lab.inspection.summary_bar_plot(df: pandas.core.frame.DataFrame, df_categories: pandas.core.frame.DataFrame, clusters: List[pandas.core.frame.DataFrame])[source]

Create trace stacked plot scaled by total jobs of each parameter within clusters (if any).

Parameters
  • df – data frame with duration information as returned by process_inspection_results

  • df_categories – filtered dataframe with columns to analyze as returned by columns_to_analyze

  • clusters – list of subset dataframes with the last value in list being the entire data set

thoth.lab.inspection.summary_trace_plot(df: pandas.core.frame.DataFrame, df_categories: pandas.core.frame.DataFrame, dfs: Optional[List[pandas.core.frame.DataFrame]] = None)[source]

Create trace plot scaled by percentage of compositions of each parameter separated by hues.

Parameters
  • df – data frame with duration information as returned by process_inspection_results

  • df_categories – filtered dataframe with columns to analyze as returned by columns_to_analyze

  • dfs – dataframes of clustered data (if any) appended to dataframe of

entire dataset (ie: [df_left_cluster, df_right_cluster, df_duration])

thoth.lab.inspection_report module

Inspection report generation and visualization.

thoth.lab.inspection_report.create_df_report(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Show unique values for each column in the dataframe.

thoth.lab.inspection_report.create_dfs_inspection_classes(inspection_df: pandas.core.frame.DataFrame) → dict[source]

Create all inspection dataframes per class with unique values and complete values.

thoth.lab.inspection_report.multi_table(table_dict)[source]

Accept a list of IpyTable objects and return a table which contains each IpyTable in a cell.

thoth.lab.underscore module

Pandas common operations and utilities.

thoth.lab.underscore.rget(obj: Any, attr: str, default: Any = <object object>) → Any

Recursively retrieve nested attributes of an object.

Parameters
  • f – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

  • default – default attribute, similar to getattr’s default

Returns

Any, retrieved attribute

thoth.lab.utils module

Various utilities for notebooks.

thoth.lab.utils.display_page(location: str, verify: bool = True, no_obtain_location: bool = False, width: int = 980, height: int = 900)[source]

Display the given page in notebook as iframe.

thoth.lab.utils.get(obj, attr, *, default: Any = <object object>)[source]

Combine both getattr and dict.get into universal get.

thoth.lab.utils.get_column_group(df: pandas.core.frame.DataFrame, columns: Union[List[Union[str, int]], pandas.core.indexes.base.Index] = None, label: str = None) → pandas.core.series.Series[source]

Group columns of the DataFrame into a single column group.

thoth.lab.utils.get_index_group(df: pandas.core.frame.DataFrame, names: List[Union[str, int]] = None, label: str = None) → pandas.core.series.Series[source]

Group multiple index levels into single index group.

thoth.lab.utils.group_columns(df: pandas.core.frame.DataFrame, columns: Union[List[Union[str, int]], pandas.core.indexes.base.Index] = None, label: str = None, inplace: bool = False) → pandas.core.series.Series[source]

Group columns of the DataFrame into a single column group and set it to the DataFrame.

thoth.lab.utils.group_index(df: pandas.core.frame.DataFrame, names: List[Union[str, int]] = None, label: str = None, inplace: bool = False) → pandas.core.frame.DataFrame[source]

Group multiple index levels into single index group and set it as index to the DataFrame.

thoth.lab.utils.has(obj, attr)[source]

Combine both hasattr and in into universal has.

thoth.lab.utils.highlight(df: pandas.core.frame.DataFrame, content: str = None, column_class: str = None, colours: Union[list, str] = None)[source]

Highlight rows of content column of a given DataFrame.

Highlight can be based on column_class or custom colours provided.

thoth.lab.utils.obtain_location(name: str, verify: bool = False, only_netloc: bool = False) → str[source]

Obtain location of a service based on it’s name in Red Hat’s internal network.

This function basically checks redirect of URL registered at Red Hat’s internal network. By doing so it is prevented to expose internal URLs. There is queried https://url.corp.redhat.com for redirects.

>>> obtain_location('thoth-sbu', verify=False)
thoth.lab.utils.packages_info(thoth_packages: bool = True) → pandas.core.frame.DataFrame[source]

Display information about versions of packages available in the installation.

thoth.lab.utils.resolve_query(query: str, context: pandas.core.frame.DataFrame = None, resolvers: tuple = None, engine: str = None, parser: str = 'pandas')[source]

Resolve query in the given context.

thoth.lab.utils.rget(obj: Any, attr: str, default: Any = <object object>) → Any

Recursively retrieve nested attributes of an object.

Parameters
  • f – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

  • default – default attribute, similar to getattr’s default

Returns

Any, retrieved attribute

thoth.lab.utils.rgetattr(obj: Any, attr: str, default: Any = <object object>) → Any

Recursively retrieve nested attributes of an object.

Parameters
  • f – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

  • default – default attribute, similar to getattr’s default

Returns

Any, retrieved attribute

thoth.lab.utils.rhas(obj: Any, attr: str) → bool

Recursively check nested attributes of an object.

Parameters
  • fhas – callable, function to be used as hasattr

  • fget – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

Returns

bool, whether the object has the given attribute

thoth.lab.utils.rhasattr(obj: Any, attr: str) → bool

Recursively check nested attributes of an object.

Parameters
  • fhas – callable, function to be used as hasattr

  • fget – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

Returns

bool, whether the object has the given attribute

thoth.lab.utils.scale_colour_continuous(arr: Iterable, colour_palette=None, n_colours: int = 10, norm=False)[source]

Scale given arrays into colour array by specific palette.

The default number of colours is 10, which translates to dividing an array on a scale from 0 to 1 into 0.1 colour bins.

Module contents

Routines for experiments in Thoth not only for Jupyter notebooks.