Thoth’s architecture¶

In this section, the reader gets a notion about Thoth architecture, requirements for deployment and main Thoth’s components.

The whole deployment is divided into multiple namespaces (or OpenShift projects):

thoth-frontend
thoth-middletier
thoth-backend
thoth-graph
thoth-infra
amun-api (optional, used on Amun)
amun-inspection (optional, used on Amun)

The main reason behind splitting the application into multiple namespaces are workloads. Thoth is running different type of one-time workloads based on a trigger - for example a single adviser instance is created as per user request. The workload is then scheduled into a separate namespace (backend, in case of adviser) and the given namespace acts as a pool of resources that are available to be used for workloads. Other namespaces, for example frontend, can be still used to scale, build, re-deploy or manage components.

Thoth is deployed from thoth-station/thoth-application repo using Argo CD. Deployment can also be accomplished using using kustomize.

See Requirements for AICoE-CI & Thoth deployment with more detailed information.

Infra Namespace¶

This namespace is separated for “infrastructure” related bits.

Components running in this namespace:

metrics-exporter - exposing deployment and content related metrics
advise-reporter - periodically calculating and reporting information about adviser runs
slo-reporter - providing reports on Thoth Service Level Objectives (SLO) to stakeholders
investigator-consumer - Kafka based component that consumes messages produced by Thoth components and schedules Argo workflows
investigator-message-metrics - exposing calculated metrics from messages seen in the system
pulp-metrics-exporter - exposing metrics for the Operate First Pulp instance

Frontend Namespace¶

The thoth-frontend is used as a management namespace. Services running in this namespace usually have a service account assigned for running and managing pods that are available inside other namespaces.

A user can interact with the user-facing API that is the key interaction point for users or bots. The user-facing API specifies its endpoints using Swagger/OpenAPI specification. See Thamos repo and documentation - a library and CLI for interacting with Thoth and the user API service repo itself for more info. You can also find more info in the integration section.

Components running in this namespace:

graph-refresh-job - a periodic job responsible for scheduling analyses of packages that were not yet analyzed
package-releases-job - a periodic job responsible for tracking new releases on Python’s package index (the public one is PyPI.org, see also AICoE index)
cve-update-job - a periodic job responsible for gathering CVE information about packages
package-update-job - a periodic job responsible for checking the availability of packages along with their hashes from Python’s package index.
cve-update-job - a periodic job responsible for gathering CVE information about packages
pulp-pypi-sync-job - a periodic job responsible for registering Python package indexes available on pulp-python
document-sync-job - a periodic job responsible for distributing computed data across deployments

Middletier Namespace¶

The middletier namespace is used for analyzes and actual resource hungry tasks that compute results for Thoth’s knowledge graph. This namespace was separated from the frontend namespace to guarantee application responsibility. All the tasks that require computing results for the knowledge graph are scheduled in this namespace. This namespace has an allocated pool of resources for such un-predicable amount of computational pods needed for this purpose (e.g. pods are not scheduled besides running user API possibly making user API non-responsive).

Components running in this namespace:

package-extract - an analyzer responsible for extracting packages from runtime/buildtime environments (container images)
solver - an analyzer run to gather information about dependencies between packages (on which packages the given package depends on?, what versions satisfy version ranges?) and gathers observations such as whether the given package is installable into the given environment and if it is present on a Python package index
graph-sync-job 1 - a job responsible for syncing data in a JSON format persisted on Ceph to the Thoth’s knowledge graph database
prescriptions-refresh-job - a periodic job responsible for keeping Thoth’s prescriptions up to date

All the components are scheduled using Argo workflows. Additional logic used during executing workflows is taken from thoth-station/workflow-helpers repository.

Backend Namespace¶

The backend part of application is used for executing code that, based on gathered information from analyzers run in the middletier namespace, compute results for actual Thoth users (bots or humans).

This namespace has, as in the case of middletier namespace, allocated pool of resources. Each time a user requests a recommendation, pods are dynamically created in this namespace to compute results.

Components running in this namespace:

adviser - a recommendation engine computing stack level recommendations for a user for the given runtime environment
provenance-checker - an analyzer that checks for provenance (origin) of packages so that a user uses correct packages from correct package sources (Python indexes); the implementation now lies in the adviser repo
graph-sync-job 1 - a job responsible for syncing data in a JSON format persisted on Ceph to the Thoth’s knowledge graph database

All the components are scheduled using Argo workflows. Additional logic used during executing workflows is taken from thoth-station/workflow-helpers repository.

Graph Namespace¶

A separate namespace for database related deployments.

Components running in this namespace:

Thoth’s knowledge graph - a PostgreSQL database
pgbouncer - recycle and manage connections to the database; all the components talk to this component rather than directly to PostgreSQL
pgweb (optional) - interact with Thoth’s knowledge graph via UI
postgresql-metrics-exporter - PostgreSQL related metrics for the the database observability
graph-backup-job - a periodic job that creates database backups
graph-metrics-exporter - a periodic job that exports metrics out of the main database asynchronously

1(1,2): graph-sync-job runs in several namespaces, as its purpose it to sync the result of other components (executing in differents namespaces) to the postgres database

Grafana dashboards¶

To guarantee application observability, there were created Grafana dashboards available in thoth-station/thoth-application repository.

Argo Workflows and Kafka¶

The whole Thoth deployment relies on Argo Workflows and Kafka. kafdrop can be used as a Kafka Web UI (check thoth-messaging repo) and Argo Workflows provides Argo UI to check and visualize workflows.

Amun¶

See Amun API for more info. Amun also uses Kafka and Argo Workflows as listed above.

Amun API namespace¶

Amun API - API for the execution engine for inspecting quality, performance, and usability of software and software stacks in a cluster

Amun inspection namespace¶

inspection builds and jobs - created by Amun API and executed
dependency-monkey - an analyzer that dynamically constructs package stacks and submits them to Amun for dynamic application analysis

For more information, see Amun API repository and autogenerated Amun client. See also the performance repo for scripts used for performance related inspections.