Thoth’s architecture

In this section, the reader gets a notion about Thoth architecture, requirements for deployment and main Thoth’s components.

The whole deployment is divided into multiple namespaces (or OpenShift projects):

  • thoth-frontend

  • thoth-middletier

  • thoth-backend

  • thoth-graph

  • thoth-infra

  • amun-api (optional, used on Amun)

  • amun-inspection (optional, used on Amun)

The main reason behind splitting the application into multiple namespaces are workloads. Thoth is running different type of one-time workloads based on a trigger - for example a single adviser instance is created as per user request. The workload is then scheduled into a separate namespace (backend, in case of adviser) and the given namespace acts as a pool of resources that are available to be used for workloads. Other namespaces, for example frontend, can be still used to scale, build, re-deploy or manage components.

Thoth is deployed from thoth-station/thoth-application repo using Argo CD. Deployment can also be accomplished using using kustomize.

See Requirements for AICoE-CI & Thoth deployment with more detailed information.

Infra Namespace

This namespace is separated for “infrastructure” related bits.

Components running in this namespace:

Frontend Namespace

The thoth-frontend is used as a management namespace. Services running in this namespace usually have a service account assigned for running and managing pods that are available inside other namespaces.

A user can interact with the user-facing API that is the key interaction point for users or bots. The user-facing API specifies its endpoints using Swagger/OpenAPI specification. See Thamos repo and documentation - a library and CLI for interacting with Thoth and the user API service repo itself for more info. You can also find more info in the integration section.

Components running in this namespace:

  • graph-refresh-job - a periodic job responsible for scheduling analyses of packages that were not yet analyzed

  • package-releases-job - a periodic job responsible for tracking new releases on Python’s package index (the public one is PyPI.org, see also AICoE index)

  • cve-update-job - a periodic job responsible for gathering CVE information about packages

  • package-update-job - a periodic job responsible for checking the availability of packages along with their hashes from Python’s package index.

  • cve-update-job - a periodic job responsible for gathering CVE information about packages

  • pulp-pypi-sync-job - a periodic job responsible for registering Python package indexes available on pulp-python

  • document-sync-job - a periodic job responsible for distributing computed data across deployments

Middletier Namespace

The middletier namespace is used for analyzes and actual resource hungry tasks that compute results for Thoth’s knowledge graph. This namespace was separated from the frontend namespace to guarantee application responsibility. All the tasks that require computing results for the knowledge graph are scheduled in this namespace. This namespace has an allocated pool of resources for such un-predicable amount of computational pods needed for this purpose (e.g. pods are not scheduled besides running user API possibly making user API non-responsive).

Components running in this namespace:

  • package-extract - an analyzer responsible for extracting packages from runtime/buildtime environments (container images)

  • solver - an analyzer run to gather information about dependencies between packages (on which packages the given package depends on?, what versions satisfy version ranges?) and gathers observations such as whether the given package is installable into the given environment and if it is present on a Python package index

  • graph-sync-job 1 - a job responsible for syncing data in a JSON format persisted on Ceph to the Thoth’s knowledge graph database

  • prescriptions-refresh-job - a periodic job responsible for keeping Thoth’s prescriptions up to date

All the components are scheduled using Argo workflows. Additional logic used during executing workflows is taken from thoth-station/workflow-helpers repository.

Backend Namespace

The backend part of application is used for executing code that, based on gathered information from analyzers run in the middletier namespace, compute results for actual Thoth users (bots or humans).

This namespace has, as in the case of middletier namespace, allocated pool of resources. Each time a user requests a recommendation, pods are dynamically created in this namespace to compute results.

Components running in this namespace:

  • adviser - a recommendation engine computing stack level recommendations for a user for the given runtime environment

  • provenance-checker - an analyzer that checks for provenance (origin) of packages so that a user uses correct packages from correct package sources (Python indexes); the implementation now lies in the adviser repo

  • graph-sync-job 1 - a job responsible for syncing data in a JSON format persisted on Ceph to the Thoth’s knowledge graph database

All the components are scheduled using Argo workflows. Additional logic used during executing workflows is taken from thoth-station/workflow-helpers repository.

Graph Namespace

A separate namespace for database related deployments.

Components running in this namespace:

  • Thoth’s knowledge graph - a PostgreSQL database

  • pgbouncer - recycle and manage connections to the database; all the components talk to this component rather than directly to PostgreSQL

  • pgweb (optional) - interact with Thoth’s knowledge graph via UI

  • postgresql-metrics-exporter - PostgreSQL related metrics for the the database observability

  • graph-backup-job - a periodic job that creates database backups

  • graph-metrics-exporter - a periodic job that exports metrics out of the main database asynchronously

1(1,2)

graph-sync-job runs in several namespaces, as its purpose it to sync the result of other components (executing in differents namespaces) to the postgres database

Grafana dashboards

To guarantee application observability, there were created Grafana dashboards available in thoth-station/thoth-application repository.

Argo Workflows and Kafka

The whole Thoth deployment relies on Argo Workflows and Kafka. kafdrop can be used as a Kafka Web UI (check thoth-messaging repo) and Argo Workflows provides Argo UI to check and visualize workflows.

Amun

See Amun API for more info. Amun also uses Kafka and Argo Workflows as listed above.

Amun API namespace

  • Amun API - API for the execution engine for inspecting quality, performance, and usability of software and software stacks in a cluster

Amun inspection namespace

For more information, see Amun API repository and autogenerated Amun client. See also the performance repo for scripts used for performance related inspections.