thoth.lab package

Subpackages

Submodules

thoth.lab.adviser module

Adviser results processing and analysis.

thoth.lab.adviser.aggregate_adviser_results(adviser_version: str, limit_results: bool = False, max_ids: int = 5) pandas.core.frame.DataFrame[source]

Aggregate adviser results from jsons stored in Ceph.

Parameters
  • adviser_version – minimum adviser version considered for the analysis of adviser runs

  • limit_results – reduce the number of adviser runs ids considered to max_ids to test analysis

  • max_ids – maximum number of adviser runs ids considered

thoth.lab.adviser.create_adviser_heatmap(adviser_justification_df: pandas.core.frame.DataFrame, file_name: Optional[str] = None, save_result: bool = False, output_dir: Optional[str] = None)[source]

Create adviser justifications heatmap plot.

Parameters
  • adviser_justification_df – data frame as returned by `create_final_dataframe’ per identifier.

  • file_name – file name used in the name of files saved

  • save_result – resulting plots created are stored in output_dir.

  • output_dir – output directory where plots are stored if save_results is set to True.

thoth.lab.adviser.create_adviser_results_histogram(plot_df: pandas.core.frame.DataFrame)[source]

Create inspection performance parameters plot in 3D.

:param plot_df dataframe for plot of adviser results

thoth.lab.adviser.create_final_dataframe(adviser_dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Create final dataframe with all information required for plots.

Parameters

adviser_dataframe – data frame as returned by aggregate_adviser_results method.

thoth.lab.adviser.extract_adviser_justifications(report: Dict[str, Any], adviser_dict: Dict[str, Any], ids: str) Dict[str, Any][source]

Retrieve justifications from adviser results.

thoth.lab.adviser.extract_justifications_from_products(products: List[Dict[str, Any]], adviser_dict: Dict[str, Any], ids: str) Dict[str, Any][source]

Extract justifications from products in adviser results.

thoth.lab.common module

Common methods for thoth-lab.

thoth.lab.common.aggregate_thoth_results(limit_results: bool = False, max_ids: int = 5, is_local: bool = True, repo_path: Optional[pathlib.Path] = None, store_name: Optional[str] = None, is_inspection: Optional[str] = None) Union[list, dict][source]

Aggregate results from jsons stored in Ceph for Thoth or locally from repo.

Parameters
  • limit_results – reduce the number of reports ids considered to max_ids to test analysis

  • max_ids – maximum number of reports ids considered

  • is_local – flag to retreive the dataset locally or from S3 (credentials are required)

  • repo_path – required if you want to retrieve the dataset locally and is_local is set to True

  • store – ResultStorageBase type depending on Thoth data (e.g solver, performance, adviser, etc.)

  • is_inspection – flag used only for InspectionResultStore as we store results in batches

thoth.lab.common.aggregate_thoth_results_from_ceph(store_name: str, files: Union[dict, list], limit_results: bool = False, max_ids: int = 5) Tuple[Union[dict, list], int][source]

Aggregate Thoth results from Ceph.

thoth.lab.common.extract_zip_file(file_path: pathlib.Path)[source]

Extract files from zip files.

thoth.lab.convert module

Utilities to work with package dependencies.

thoth.lab.dependency_monkey module

Dependency Monkey results processing and analysis.

thoth.lab.dependency_monkey.aggregate_dm_results_per_identifier(identifiers_inspection: List[str], limit_results: bool = False, max_batch_identifiers_ids: int = 5) Union[dict, List[str]][source]

Aggregate inspection batch ids and specification from dm documents stored in Ceph.

Parameters
  • inspection_identifier – list of identifier/s to filter inspection batch ids

  • limit_results – limit inspection batch ids considered to max_batch_identifiers_ids to test analysis

  • max_batch_identifiers_ids – maximum number of inspection batch ids considered

thoth.lab.exception module

Exceptions for thoth-lab methods.

exception thoth.lab.exception.NotUniqueValues[source]

Bases: Exception

An exception when dateframe unique method cannot return results.

thoth.lab.graph module

Various helpers and utils for interaction with the graph database.

class thoth.lab.graph.DependencyGraph(incoming_graph_data=None, **attr)[source]

Bases: networkx.classes.ordered.OrderedDiGraph

Construct a dependency graph by extending nx.OrderedDiGraph.

adjlist_dict_factory

alias of collections.OrderedDict

static get_root(tree)[source]

Return root of the current graph, if any.

By default, tree topology is considered as input, so if there are multiple roots, only the first one is returned.

node_dict_factory

alias of collections.OrderedDict

class thoth.lab.graph.GraphQueryResult(result)[source]

Bases: object

Wrap results of graph database queries.

plot_bar()[source]

Plot histogram of results obtained.

plot_pie()[source]

Plot a pie of results into Jupyter notebook.

serialize()[source]

Serialize the output of graph query.

to_dataframe()[source]

Construct a panda’s dataframe on results.

thoth.lab.graph.get_root(tree)

Return root of the current graph, if any.

By default, tree topology is considered as input, so if there are multiple roots, only the first one is returned.

thoth.lab.inspection module

Inspection results processing and analysis.

thoth.lab.inspection.aggregate_inspection_results(limit_results: bool = False, max_ids: int = 5, is_local: bool = True, inspection_repo_path: pathlib.Path = PosixPath('performance')) list[source]

Aggregate inspection results from jsons stored in Ceph or locally from performance repo.

Parameters
  • limit_results – reduce the number of inspection reports ids considered to max_ids to test analysis

  • max_ids – maximum number of inspection reports ids considered

  • is_local – flag to retreive the dataset locally or from S3 (credentials are required)

  • inspection_repo_path – required to retrieve the performance dataset locally and is_local is set to True

thoth.lab.inspection.aggregate_inspection_results_per_identifier(inspection_ids: List[str], identifier_inspection: List[str], inspection_batch_data: Dict[str, dict]) dict[source]

Aggregate inspection results per identifier from inspection documents stored in Ceph.

Parameters
  • inspection_ids – list of inspection ids

  • identifier_inspection – list of identifier/s to filter inspection ids

  • inspection_batch_data – info to be added to each inspection (e.g. specification)

thoth.lab.inspection.columns_to_analyze(df: pandas.core.frame.DataFrame, low: int = 0, display_clusters: bool = False, cluster_by_hue: bool = False) pandas.core.frame.DataFrame[source]

Print all columns within dataframe and count of unique column values within limit.

Parameters
  • df – data frame to analyze as returned by `process_inspection_results’

  • low – the lower limit (0 if not specified) of distinct value counts

  • display_clusters – if true, displays grouped counts of parameter and parameter sort_values

  • cluster_by_hue – if true, displays distribution of parameters to analyze sorted by hues

thoth.lab.inspection.concatenated_df(dfs: List[pandas.core.frame.DataFrame], column: str)[source]

Reorganize dataframe to show the distribution of jobs in a category across different subsets of data.

Parameters
  • dfs – list of inspection result dataframes which can be different datasets or subset of datasets

  • column – column name or category for grouping to see the distribution of results

thoth.lab.inspection.create_duration_box(data: pandas.core.frame.DataFrame, columns: Optional[Union[str, List[str]]] = None, **kwargs)[source]

Create duration Box plot.

thoth.lab.inspection.create_duration_dataframe(inspection_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Compute statistics and duration DataFrame.

thoth.lab.inspection.create_duration_histogram(data: pandas.core.frame.DataFrame, columns: Optional[Union[str, List[str]]] = None, bins: Optional[int] = None, **kwargs)[source]

Create duration Histogram plot.

thoth.lab.inspection.create_duration_scatter(data: pandas.core.frame.DataFrame, columns: Optional[Union[str, List[str]]] = None, **kwargs)[source]

Create duration Scatter plot.

thoth.lab.inspection.create_duration_scatter_with_bounds(data: pandas.core.frame.DataFrame, col: str, index: Optional[Union[list, pandas.core.indexes.base.Index, pandas.core.indexes.range.RangeIndex]] = None, **kwargs)[source]

Create duration Scatter plot with upper and lower bounds.

thoth.lab.inspection.create_filtered_df(df: pandas.core.frame.DataFrame, pi_name: Optional[str] = None, pi_component: Optional[str] = None, runtime_environment: Optional[str] = None, packages: Optional[List[Tuple[str, str, str]]] = None) pandas.core.frame.DataFrame[source]

Create dataframe using the filters selected for plots.

thoth.lab.inspection.create_final_dataframe(packages_versions: dict, python_packages_dataframe: pandas.core.frame.DataFrame, inspection_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Create final dataframe with all information required for plots.

Parameters
  • packages_versions – dict as returned by create_python_package_df method.

  • python_packages_dataframe – data frame as returned by create_python_package_df method.

  • inspection_df – data frame containing data of inspections results.

thoth.lab.inspection.create_inspection_2d_plot(plot_df: pandas.core.frame.DataFrame, quantity: str, components: List[str], color_scales: List[str], identifiers_inspections: List[str], have_annotations: bool = False)[source]

Create inspection performance parameters plot in 2D.

:param plot_df dataframe for plot of inspections results

thoth.lab.inspection.create_inspection_3d_plot(plot_df: pandas.core.frame.DataFrame, quantity: str, identifiers_inspections: List[str])[source]

Create inspection performance parameters plot in 3D.

:param plot_df dataframe for plot of inspections results

thoth.lab.inspection.create_inspection_analysis_plots(inspection_df: pandas.core.frame.DataFrame)[source]

Create inspection analysis plots for the inspection pd.Dataframe.

Parameters

inspection_df – data frame as returned by `process_inspection_results’ for a specific inspection identifier

thoth.lab.inspection.create_inspection_dataframes(inspection_results_dict: dict, duration_info: bool = False) dict[source]

Create dictionary with data frame as returned by `process_inspection_results’ for each inspection identifier.

Parameters

inspection_results_dict – dictionary containing inspection results per inspection identifier.

thoth.lab.inspection.create_inspection_parameters_dataframes(parameters: List[str], inspection_df_dict: dict, component: Optional[str] = None) Dict[str, pandas.core.frame.DataFrame][source]

Create pd.DataFrame of selected parameters from inspections results to be used for statistics and error analysis.

It also outputs batches and parameters map that is necessary for plots.

Parameters
  • parameters – inspection parameters used in the analysis

  • inspection_df_dict – dictionary with data frame as returned by `process_inspection_results’ per identifier.

  • component – PI component name (e.g tensorflow, pytorch).

thoth.lab.inspection.create_inspection_time_dataframe()[source]

Create pd.Dataframe of time of inspections for build and job.

thoth.lab.inspection.create_multiple_violin_plot(data: pandas.core.frame.DataFrame, quantity: str, x_label: str = '', y_label: str = '', save_result: bool = False, project_folder: str = '', folder_name: str = '', linewidth: int = 1)[source]

Create violin plot.

thoth.lab.inspection.create_plot_from_df(data: pandas.core.frame.DataFrame, columns: Optional[Union[str, List[str]]] = None, title_plot: str = ' ', x_label: str = ' ', y_label: str = ' ', static: str = True, save_result: bool = False, project_folder: str = '', folder_name: str = '', scatter: bool = False)[source]

Create plot using two columns of the DataFrame.

thoth.lab.inspection.create_plot_multiple_batches(data: pandas.core.frame.DataFrame, quantity: str, plot_type: str = 'box', x_label: str = '', y_label: str = '', static: str = True, save_result: bool = False, project_folder: str = '', folder_name: str = '')[source]

Create (Histogram or Box) plot using several columns of the dataframe(static as default).

thoth.lab.inspection.create_python_package_df(inspection_df: pandas.core.frame.DataFrame) Union[pandas.core.frame.DataFrame, dict][source]

Create DataFrame with only python packages present in software stacks.

thoth.lab.inspection.create_scatter_and_correlation(data: pandas.core.frame.DataFrame, columns: Optional[Union[str, List[str]]] = None, title_scatter: str = 'Scatter plot')[source]

Create Scatter plot and evaluate correlation coefficients.

thoth.lab.inspection.create_scatter_plots_for_multiple_batches(inspection_df_dict: Dict[str, pandas.core.frame.DataFrame], list_batches: List[str], columns: Optional[Union[str, List[str]]] = None, title_scatter: str = ' ', x_label: str = ' ', y_label: str = ' ')[source]

Create Scatter plots for multiple batches.

Parameters
  • inspection_df_dict – dictionary with data frame as returned by `process_inspection_results’ per identifier

  • list_batches – list of batches to be used for correlation analysis

  • columns – parameters to be considered, taken from data frame as returned by `process_inspection_results’

  • title_scatter – scatter plot name

  • x_label – x label name

  • y_label – y label name

thoth.lab.inspection.dataframe_statistics(inspection_df: pandas.core.frame.DataFrame, plot_title: str)[source]

Output a data frame with relevant statistics on job duration, build duration and time elapsed.

Parameters
  • inspection_df – data frame to analyze as returned by `process_inspection_results’ (duration [ms])

  • plot_title – title of fit plot

thoth.lab.inspection.display_jobs_by_subcategories(df: pandas.core.frame.DataFrame)[source]

Create dataframe with job counts for each subcategory for every column in the data frame.

Parameters

df – dataframe with columns of unique value counts as returned by columns_to_analyze

thoth.lab.inspection.duration_plots(df: pandas.core.frame.DataFrame)[source]

Create plots for job and build duration, elapsed time, and lead time.

Parameters

df – data frame with duration information as returned by process_inspection_results

thoth.lab.inspection.evaluate_inspection_statistics(parameters: list, inspection_df_dict: dict, component: Optional[str] = None) dict[source]

Aggregate statistical quantities per inspection parameter for inspection batches.

Parameters
  • parameters – inspection parameters used in the analysis

  • inspection_df_dict – dictionary with data frame as returned by `process_inspection_results’ per identifier

thoth.lab.inspection.evaluate_statistics(inspection_df: pandas.core.frame.DataFrame, inspection_parameter: str) Dict[source]

Evaluate statistical quantities of a specific parameter of inspection results.

thoth.lab.inspection.evaluate_statistics_on_inspection_df(df: pandas.core.frame.DataFrame, column_names: List[str]) pandas.core.frame.DataFrame[source]

Evaluate statistics on performance values selected from Dataframe columns.

thoth.lab.inspection.extract_keys_from_dataframe(df: pandas.core.frame.DataFrame, key: str)[source]

Filter the specific dataframe created for a certain key, combination of keys or for a tree depth.

thoth.lab.inspection.extract_specification(inspection_batch_result: Dict[str, Any], inspection_id: str)[source]

Extract specification info for the inspection.

thoth.lab.inspection.extract_structure_json(input_json: dict, upper_key: str, depth: int, json_structure)[source]

Convert a json file structure into a list with rows showing tree depths, keys and values.

Parameters
  • input_json – inspection result json taken from Ceph

  • upper_key – key starting point to recursively traverse all tree

  • depth – depth in the tree

  • json_structure – recurrent list to store results while traversing the tree

thoth.lab.inspection.filter_df(df, *args)[source]

Filter Dataframe.

thoth.lab.inspection.filter_document_ids(inspection_store, inspection_identifiers: List[str]) Dict[str, List][source]

Filter inspection document ids list according to the inspection identifiers selected.

Parameters

inspection_identifiers – list of identifier/s to filter inspection ids

thoth.lab.inspection.filter_inspection_ids(inspection_identifiers: List[str]) dict[source]

Filter inspection ids list according to the inspection identifier selected.

Parameters

inspection_identifiers – list of identifier/s to filter inspection ids

thoth.lab.inspection.make_subplots(data: pandas.core.frame.DataFrame, columns: Optional[List[str]] = None, *, kind: str = 'box', **kwargs)[source]

Make subplots and arrange them in an optimized grid layout.

thoth.lab.inspection.map_column_to_feature_class(column_name: str)[source]

Use Helper function that maps a column in the original dataframe to a feature class.

Parameters

column_name – column_name in inspection_df dataframe

obtained by process_inspection_results with no columns dropped (drop=False)

thoth.lab.inspection.plot_distribution_of_jobs_combined_categories(df_hardware_category: pandas.core.frame.DataFrame, df_duration: pandas.core.frame.DataFrame, df_analyze: pandas.core.frame.DataFrame)[source]

Plot the job duration distribution for each unique hardware combination/configuration of data.

Parameters
  • df_hardware_category – dataframe of of parameters to analyze grouped by distinct rows

  • df_duration – dataframe with duration information as returned by process_inspection_results

  • df_analyze – dataframe of parameters that show variation across the clusters

thoth.lab.inspection.plot_interpolated_statistics_of_inspection_parameters(statistical_results_dict: dict, identifier_inspection_list: dict, inspection_parameters: List[str], colour_list: List[str], statistical_quantities: List[str], title_plot: Optional[str] = None, title_xlabel: Optional[str] = None, title_ylabel: Optional[str] = None, save_result: bool = False, project_folder: Optional[str] = None, folder_name: Optional[str] = None, componet: Optional[str] = None)[source]

Plot interpolated statistical quantity/ies of inspection parameter/s from different inspection batches.

thoth.lab.inspection.plot_subcategories_by_hues(df_cat: pandas.core.frame.DataFrame, df: pandas.core.frame.DataFrame, column)[source]

Create scatter plots with parameter categories separated by hues.

Parameters
  • df_cat – filtered dataframe with columns to analyze as returned by columns_to_analyze

  • df – data frame with duration information as returned by process_inspection_results

  • colum – job duration/build duration columns from ‘df’

thoth.lab.inspection.process_empty_or_mutable_parameters(inspection_df: pandas.core.frame.DataFrame)[source]

Process empty or mutable parameters in dataframe.

These values will not work with further processing using the groupby function. Prints the unique value count of all columns that are unhashable (all such columns are constant). Drops these columns and returns a new dataframe.

Parameters

inspection_df – data frame as returned by process_inspection_results

with no columns dropped (drop=False)

thoth.lab.inspection.process_inspection_results(inspection_results: List[dict], exclude: Optional[Union[list, set]] = None, apply: Optional[List[Tuple]] = None, drop: bool = True, verbose: bool = False, duration_info: bool = False) pandas.core.frame.DataFrame[source]

Process inspection result into pd.DataFrame.

thoth.lab.inspection.query_inspection_dataframe(inspection_df: pandas.core.frame.DataFrame, *args, **kwargs) pandas.core.frame.DataFrame[source]

Use Wrapper around _.query method which always include duration columns in filter expression.

thoth.lab.inspection.show_categories(inspection_df: pandas.core.frame.DataFrame)[source]

List categories in the given inspection pd.DataFrame.

thoth.lab.inspection.show_inspection_inputs(filtered_inspection_ids: List[str], inspection_batch_ids: List[str], filtered_inspection_batch_ids: List[str])[source]

Show inspections inputs for the analysis.

Parameters
  • filtered_inspection_ids – list of inspection ids after filtering

  • inspection_batch_ids – list of inspection batch ids

  • filtered_inspection_batch_ids – llist of inspection batch ids after filtering

thoth.lab.inspection.show_unique_value_count_by_feature_class(processed_df: pandas.core.frame.DataFrame)[source]

Show unique count values per feature/class.

Show results per feature/class that are subdivided in subclasses that map to it.

Parameters

processed_df – processed dataframe as returned by the process_empty_or_mutable_parameters

thoth.lab.inspection.summary_bar_plot(df: pandas.core.frame.DataFrame, df_categories: pandas.core.frame.DataFrame, clusters: List[pandas.core.frame.DataFrame])[source]

Create trace stacked plot scaled by total jobs of each parameter within clusters (if any).

Parameters
  • df – data frame with duration information as returned by process_inspection_results

  • df_categories – filtered dataframe with columns to analyze as returned by columns_to_analyze

  • clusters – list of subset dataframes with the last value in list being the entire data set

thoth.lab.inspection.summary_trace_plot(df: pandas.core.frame.DataFrame, df_categories: pandas.core.frame.DataFrame, dfs: Optional[List[pandas.core.frame.DataFrame]] = None)[source]

Create trace plot scaled by percentage of compositions of each parameter separated by hues.

Parameters
  • df – data frame with duration information as returned by process_inspection_results

  • df_categories – filtered dataframe with columns to analyze as returned by columns_to_analyze

  • dfs – dataframes of clustered data (if any) appended to dataframe of

entire dataset (ie: [df_left_cluster, df_right_cluster, df_duration])

thoth.lab.inspection_report module

Inspection report generation and visualization.

thoth.lab.inspection_report.create_df_report(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Show unique values for each column in the dataframe.

thoth.lab.inspection_report.create_dfs_inspection_classes(inspection_df: pandas.core.frame.DataFrame) dict[source]

Create all inspection dataframes per class with unique values and complete values.

thoth.lab.inspection_report.multi_table(table_dict)[source]

Accept a list of IpyTable objects and return a table which contains each IpyTable in a cell.

thoth.lab.security module

Security results processing and analysis.

class thoth.lab.security.SecurityIndicators[source]

Bases: object

Class of methods used to analyze Security Indicators (SI).

static add_release_date(metadata_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Add release date to metadata.

static aggregate_security_indicator_bandit_results(limit_results: bool = False, max_ids: int = 5, is_local: bool = True, security_indicator_bandit_repo_path: pathlib.Path = PosixPath('security/si-bandit')) list[source]

Aggregate si_bandit results from jsons stored in Ceph or locally from si_bandit repo.

Parameters
  • limit_results – reduce the number of si_bandit reports ids considered to max_ids to test analysis

  • max_ids – maximum number of si_bandit reports ids considered

  • is_local – flag to retreive the dataset locally or from S3 (credentials are required)

  • si_bandit_repo_path – path to retrieve the si_bandit dataset locally and is_local is set to True

static aggregate_security_indicator_cloc_results(limit_results: bool = False, max_ids: int = 5, is_local: bool = True, security_indicator_cloc_repo_path: pathlib.Path = PosixPath('security/si-cloc')) list[source]

Aggregate si_cloc results from jsons stored in Ceph or locally from si_cloc repo.

Parameters
  • limit_results – reduce the number of si_cloc reports ids considered to max_ids to test analysis

  • max_ids – maximum number of si_cloc reports ids considered

  • is_local – flag to retreive the dataset locally or from S3 (credentials are required)

  • si_cloc_repo_path – path to retrieve the si_cloc dataset locally and is_local is set to True

static create_package_releases_vulnerabilities_trend(si_bandit_df: pandas.core.frame.DataFrame, package_name: str, package_index: str, security_infos: Optional[List[str]] = None, show_vulnerability_data: bool = False)[source]

Plot vulnerabilites trend for a Python package from a certain index.

Parameters

si_bandit_df – pandas dataframe given by ‘create_si_bandit_final_dataframe’ method

with use_external_source_data set to True. :param package_name: Python Package name filter :param package_index: Python Package index filter :param security_infos: list of info to be visualized in the plot :param show_vulnerability_data: show all data regarding vulnerabilites if set to True

create_security_confidence_dataframe(si_bandit_report: dict, filters_files: Optional[List[str]] = None) Tuple[pandas.core.frame.DataFrame, Dict[str, int]][source]

Create Security/Confidence dataframe for si-bandit report.

create_si_bandit_final_dataframe(si_bandit_reports: List[dict], use_external_source_data: bool = False, filters_files: Optional[List[str]] = None) pandas.core.frame.DataFrame[source]

Create final si-bandit dataframe.

create_si_bandit_metadata_dataframe(si_bandit_report: dict) pandas.core.frame.DataFrame[source]

Create si-bandit report metadata dataframe.

create_si_cloc_final_dataframe(si_cloc_reports: list) pandas.core.frame.DataFrame[source]

Create final si-cloc dataframe.

create_si_cloc_metadata_dataframe(si_cloc_report: dict) pandas.core.frame.DataFrame[source]

Create si-cloc report metadata dataframe.

create_si_cloc_results_dataframe(si_cloc_report: dict) pandas.core.frame.DataFrame[source]

Create si-cloc report results dataframe.

static create_vulnerabilities_plot(security_df: pandas.core.frame.DataFrame, security_infos: Optional[List[str]] = None, show_vulnerability_data: bool = False) None[source]

Plot vulnerabilites trend for a Python package from a certain index.

Parameters

security_df – pandas dataframe given by ‘create_si_bandit_final_dataframe’ method

with use_external_source_data set to True. :param security_infos: list of info to be visualized in the plot :param show_vulnerability_data: show all data regarding vulnerabilites if set to True

static define_si_scores()[source]

Define security scores from si bandit outputs.

WARNING: It depends on all data considered.

static extract_data_from_si_bandit_metadata(report_metadata: dict) dict[source]

Extract data from si-bandit report metadata.

static extract_data_from_si_cloc_metadata(report_metadata: dict) dict[source]

Extract data from si-cloc report metadata.

static extract_severity_confidence_info(si_bandit_report: dict, filters_files: Optional[List[str]] = None) Tuple[List[dict], Dict[str, int]][source]

Extract severity and confidence from result metrics.

static produce_si_bandit_report_summary_dataframe(metadata_df: pandas.core.frame.DataFrame, si_bandit_sec_conf_df: pandas.core.frame.DataFrame, summary_files: Dict[str, int]) pandas.core.frame.DataFrame[source]

Create si-bandit report summary dataframe.

static produce_si_cloc_report_summary_dataframe(metadata_df: pandas.core.frame.DataFrame, cloc_results_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Create si-cloc report summary dataframe.

thoth.lab.solver module

Solver results processing and analysis.

thoth.lab.solver.aggregate_solver_results(limit_results: bool = False, max_ids: int = 5, is_local: bool = True, solver_repo_path: pathlib.Path = PosixPath('solver')) list[source]

Aggregate solver results from jsons stored in Ceph or locally from solver repo.

Parameters
  • limit_results – reduce the number of solver reports ids considered to max_ids to test analysis

  • max_ids – maximum number of solver reports ids considered

  • is_local – flag to retreive the dataset locally or from S3 (credentials are required)

  • solver_repo_path – required if you want to retrieve the solver dataset locally and is_local is set to True

thoth.lab.solver.construct_solver_from_metadata(solver_report_metadata: dict) str[source]

Construct solver from solver report metadata.

thoth.lab.solver.extract_data_from_solver_metadata(solver_report_metadata: dict) dict[source]

Extract data from solver report metadata.

thoth.lab.solver.extract_errors_from_solver_result(solver_report_result_errors: list) list[source]

Extract all errors from solver report (if any).

thoth.lab.solver.extract_tree_from_solver_result(solver_report_result: dict) list[source]

Extract data from solver report result.

thoth.lab.underscore module

Pandas common operations and utilities.

thoth.lab.utils module

Various utilities for notebooks.

thoth.lab.utils.display_page(location: str, verify: bool = True, no_obtain_location: bool = False, width: int = 980, height: int = 900)[source]

Display the given page in notebook as iframe.

thoth.lab.utils.get(obj, attr, *, default: Any = <object object>)[source]

Combine both getattr and dict.get into universal get.

thoth.lab.utils.get_column_group(df: pandas.core.frame.DataFrame, columns: Optional[Union[List[Union[str, int]], pandas.core.indexes.base.Index]] = None, label: Optional[str] = None) pandas.core.series.Series[source]

Group columns of the DataFrame into a single column group.

thoth.lab.utils.get_index_group(df: pandas.core.frame.DataFrame, names: Optional[List[Union[str, int]]] = None, label: Optional[str] = None) pandas.core.series.Series[source]

Group multiple index levels into single index group.

thoth.lab.utils.group_columns(df: pandas.core.frame.DataFrame, columns: Optional[Union[List[Union[str, int]], pandas.core.indexes.base.Index]] = None, label: Optional[str] = None, inplace: bool = False) pandas.core.series.Series[source]

Group columns of the DataFrame into a single column group and set it to the DataFrame.

thoth.lab.utils.group_index(df: pandas.core.frame.DataFrame, names: Optional[List[Union[str, int]]] = None, label: Optional[str] = None, inplace: bool = False) pandas.core.frame.DataFrame[source]

Group multiple index levels into single index group and set it as index to the DataFrame.

thoth.lab.utils.has(obj, attr)[source]

Combine both hasattr and in into universal has.

thoth.lab.utils.highlight(df: pandas.core.frame.DataFrame, content: Optional[str] = None, column_class: Optional[str] = None, colours: Optional[Union[list, str]] = None)[source]

Highlight rows of content column of a given DataFrame.

Highlight can be based on column_class or custom colours provided.

thoth.lab.utils.obtain_location(name: str, verify: bool = False, only_netloc: bool = False) str[source]

Obtain location of a service based on it’s name in Red Hat’s internal network.

This function basically checks redirect of URL registered at Red Hat’s internal network. By doing so it is prevented to expose internal URLs. There is queried https://url.corp.redhat.com for redirects.

>>> obtain_location('thoth-sbu', verify=False)
thoth.lab.utils.packages_info(thoth_packages: bool = True) pandas.core.frame.DataFrame[source]

Display information about versions of packages available in the installation.

thoth.lab.utils.resolve_query(query: str, context: Optional[pandas.core.frame.DataFrame] = None, resolvers: Optional[tuple] = None, engine: Optional[str] = None, parser: str = 'pandas')[source]

Resolve query in the given context.

thoth.lab.utils.rget(obj: Any, attr: str, default: Any = <object object>) Any

Recursively retrieve nested attributes of an object.

Parameters
  • f – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

  • default – default attribute, similar to getattr’s default

Returns

Any, retrieved attribute

thoth.lab.utils.rgetattr(obj: Any, attr: str, default: Any = <object object>) Any

Recursively retrieve nested attributes of an object.

Parameters
  • f – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

  • default – default attribute, similar to getattr’s default

Returns

Any, retrieved attribute

thoth.lab.utils.rhas(obj: Any, attr: str) bool

Recursively check nested attributes of an object.

Parameters
  • fhas – callable, function to be used as hasattr

  • fget – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

Returns

bool, whether the object has the given attribute

thoth.lab.utils.rhasattr(obj: Any, attr: str) bool

Recursively check nested attributes of an object.

Parameters
  • fhas – callable, function to be used as hasattr

  • fget – callable, function to be used as getattr

  • obj – Any, object to check

  • attr – str, attribute to find declared by dot notation accessor

Returns

bool, whether the object has the given attribute

thoth.lab.utils.scale_colour_continuous(arr: Iterable, colour_palette=None, n_colours: int = 10, norm=False)[source]

Scale given arrays into colour array by specific palette.

The default number of colours is 10, which translates to dividing an array on a scale from 0 to 1 into 0.1 colour bins.

Module contents

Routines for experiments in Thoth not only for Jupyter notebooks.

class thoth.lab.GraphQueryResult(result)[source]

Bases: object

Wrap results of graph database queries.

plot_bar()[source]

Plot histogram of results obtained.

plot_pie()[source]

Plot a pie of results into Jupyter notebook.

serialize()[source]

Serialize the output of graph query.

to_dataframe()[source]

Construct a panda’s dataframe on results.

thoth.lab.obtain_location(name: str, verify: bool = False, only_netloc: bool = False) str[source]

Obtain location of a service based on it’s name in Red Hat’s internal network.

This function basically checks redirect of URL registered at Red Hat’s internal network. By doing so it is prevented to expose internal URLs. There is queried https://url.corp.redhat.com for redirects.

>>> obtain_location('thoth-sbu', verify=False)
thoth.lab.packages_info(thoth_packages: bool = True) pandas.core.frame.DataFrame[source]

Display information about versions of packages available in the installation.