Developer’s guide to Thoth¶
The main goal of this document is to give a first touch on how to run, develop and use Thoth as a developer.
A prerequisite for this document are the following documents:
Basics of OpenShift - see for example Basic Walkthrough
Preparing Developer’s Environment¶
You can clone repositories from thoth-station organization to your local directory. It is preferred to place repositories one next to anther as it will simplify import adjustments stated later:
$ ls -A1 thoth-station/
adviser
amun-api
amun-client
amun-hwinfo
analyzer
...
user-api
workload-operator
zuul-test-config
zuul-test-jobs
Using Pipenv and Thamos CLI¶
All of the Thoth packages use Pipenv to
create a separated and reproducible environment in which the given component
can run. Almost every repository has its own Pipfile
and Pipfile.lock
file. The Pipfile
file states direct dependencies for a project and
Pipfile.lock
file states all the dependencies (including the transitive
ones) pinned to a specific version.
If you have cloned the repositories, you can run the following command to prepare a separate virtual environment with all the dependencies (including the transitive ones):
$ pipenv install --dev
# Alternatively you can use Thamos CLI:
$ thamos install --dev
As the environment is separated for each and every repository, you can now switch between environments that can have different versions of packages installed.
If you would like to install some additional libraries, just issue:
$ pipenv install <name-of-a-package> # Add --dev if it is a devel dependency.
# Alternatively, you can use Thamos CLI:
$ thamos add <name-of-a-package>
The Pipfile
and Pipfile.lock
files get updated.
If you would like to run a CLI provided by a repository, issue the following command:
# Run adviser CLI inside adviser/ repository:
$ cd adviser/
$ pipenv run python3 ./thoth-adviser --help
The command above automatically activates separated virtual environment created for the thoth-adviser and uses packages from there.
To activate virtual environment permanently, issue:
$ pipenv shell
(adviser)$
Your shell prompt will change (showing that you are inside a virtual environment) and you can run for example Python interpret to run some of the Python code provided:
(adviser)$ python3
>>> from thoth.adviser import __version__
>>> print(__version__)
Developing cross-library features¶
As Thoth is created by multiple libraries which depend on each other, it is often desired to test some of the functionality provided by one library inside another.
Suppose you would like to run adviser with a different version of
thoth-python package (present in
the python/
directory one level up from the adviser’s directory). To do so,
the only thing you need to perform is to run the thoth-adviser CLI (in adviser repo) in the following way:
$ cd adviser/
$ PYTHONPATH=../python pipenv run ./thoth-adviser provenance --requirements ./Pipfile --requirements-locked ./Pipfile.lock --files
The PYTHONPATH
environment variable tells Python interpret to search for
sources first in the ../python
directory, this makes the following code:
from thoth.python import __version__
to first check sources present in ../python
and run code from there
(instead of running the installed thoth-python
package from PyPI inside virtual environment).
If you would like to run multiple libraries this way, you need to delimit them using a colon:
$ cd adviser/
$ PYTHONPATH=../python:../common pipenv run ./thoth-adviser --help
Running components locally¶
To improve developer’s effectivity, all the components can be run locally. If a component talks to a remote Ceph, it is possible to instrument the component so that it talks to Ceph based on the configuration supplied. Similarly, if a component talks to the database, it is possible to instrument the component to talk to the desired database instance. This way, developers can test their changes locally and, once changes are done, the code can be pushed to deployments.
If a component uses Ceph, export variables that are required for a Ceph connection. See the relevant section in thoth-station/storages README file.
If a component uses PostgreSQL, it will try to connect to a local PostgreSQL instance by default. Follow instructions in thoth-station/storages README file for more info on how to setup a local PostgreSQL instance from a database dump.
The following command will use Ceph based on the supplied configuration, connect to a local PostgreSQL instance (if any is used) and will use local version of thoth-common to support cross-library features development mentioned above:
PYTHONPATH=../common THOTH_CEPH_BUCKET_PREFIX=data THOTH_S3_ENDPOINT_URL='https://s3.redhat.com' THOTH_CEPH_KEY_ID=AAAAAAAAAAAAAAAAAAAA THOTH_CEPH_SECRET_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX THOTH_CEPH_BUCKET=thoth THOTH_DEPLOYMENT_NAME=ocp4-stage python3 ./app.py
Note
You can use .env
file together with Pipenv. See docs for more info. Some of the repositories have .env.template
ready for use.
Debugging application and logging¶
All Thoth components use logging that is implemented in the thoth-common package and is
initialized in init_logging()
function (defined in thoth-common
library). This library setups all the routines needed for logging (also sending
logs to external monitoring systems such as Sentry).
Besides the functionality stated above, the logging configuration can be
adjusted based on environment variables. If you are debugging some parts of the
Thoth application and you would like to get debug messages for a library, just
set environment variable THOTH_LOG_<library name>
to DEBUG
(or any
other log level you would like to see, so
suppressing logs is also possible by setting log level to higher values like
EXCEPTION
or ERROR
). An example of a run can be:
$ cd adviser/
$ THOTH_LOG_STORAGES=DEBUG THOTH_LOG_ADVISER=WARNING PYTHONPATH=../python pipenv run ./thoth-adviser provenance --requirements ./Pipfile --requirements-locked ./Pipfile.lock --files
The command above will suppress any debug and info messages in
thoth-adviser
(only warnings, errors and exceptions will be logged) and
increases verbosity of thoth-storages
package to DEBUG
. Additionally,
you can setup logging only for a specific module inside a package by using for
example:
$ cd adviser/
$ THOTH_LOG_STORAGES_GRAPH_POSTGRES=DEBUG THOTH_LOG_ADVISER=WARNING PYTHONPATH=../python pipenv run ./thoth-adviser provenance --requirements ./Pipfile --requirements-locked ./Pipfile.lock --files
By exporting THOTH_LOG_STORAGES_GRAPH_POSTGRES
environment variable, you
set debug log level for file thoth/storages/graph/postgres.py
provided by
thoth-storages
package. This way you can debug and inspect behavior only
for certain parts of the application. If a module has underscore in its name,
the environment variable has to have double underscores to explicitly escape it
(not to look for a logger defined in a sub-package).
The default log level is set to INFO
to all Thoth components.
See thoth-common library documentation for more info.
Testing application against Ceph and a knowledge graph database¶
If you would like to test changes in your application against data stored
inside Ceph, you can use the following command (if you have your gopass
set
up):
$ eval $(gopass show aicoe/thoth/ceph.sh)
This will inject into your environment Ceph configuration needed for adapters
available in thoth-storages
package and you can talk to Ceph instance.
In most cases you will need to set THOTH_DEPLOYMENT_NAME
environment
variable which distinguishes different deployments.
we follow the pattern of (ClusterName)-(DeploymentName)
to assign the
THOTH_DEPLOYMENT_NAME
environment variable. Ex: ocp4-stage
.
Names can be found in the corresponding Ceph bucket.
$ export THOTH_DEPLOYMENT_NAME=ocp4-stage
To browse data stored on Ceph, you can use awscli
utility from PyPI that provides aws
command (use aws
s3
as Ceph exposes S3 compatible API).
To run applications against Thoth’s knowledge graph database, see documentation of thoth-storages library which states how to connect, run, dump or recreate Thoth’s knowledge graph from a knowledge graph backup.
Running application inside OpenShift vs local development¶
All the libraries are designed to run locally (for fast developer’s experience - iterating over features as fast as possible) as well as to run them inside a cluster.
If a library uses OpenShift’s API (such as all the operators), the
OpenShift
class implemented in thoth-common
library takes care of
transparent discovery whether you run in the cluster or locally. If you would
like to run applications against OpenShift cluster from your local development
environment, use oc
command to login into the cluster and change to project
where you would like to operate in:
$ oc login <openshift-cluster-url>
...
$ oc project thoth-test-core
And run your applications (the configuration on how to talk to the cluster is
picked from OpenShift’s/Kubernetes config). You should see a courtesy warning
by thoth-common
that you are running your application locally.
To run an application from sources present in the local directory (for example
with changes you have made), you can open a pull request and issue /deploy
command as a comment to the pull request opened.
If you would like to test application with unreleased packages inside OpenShift
cluster, you can do so by installing package from a Git repo and running the
/deploy
command on the opened pull request:
# To install thoth-common package from the master branch (you can adjust GitHub organization to point to your fork):
$ pipenv install 'git+https://github.com/thoth-station/common.git@master#egg=thoth-common'
After that, you can open a pull request with adjusted dependencies. Note the git dependencies must not be merged to the repository. Thoth will fail with recommendations if it spots a VCS dependency in the application (it’s a bad practice to use such deps in prod-like deployments):
thamos.swagger_client.rest.ApiException: (400)
Reason: BAD REQUEST
HTTP response headers: HTTPHeaderDict({'Server': 'gunicorn/19.9.0', 'Date': 'Tue, 13 Aug 2019 06:28:21 GMT', 'Content-Type': 'application/json', 'Content-Length': '45257', 'Set-Cookie': 'ae5b4faaab1fe6375d62dbc3b1efaf0d=3db7db180ab06210797424ca9ff3b586; path=/; HttpOnly'})
HTTP response body: {
"error": "Invalid application stack supplied: Package thoth-storages uses a version control system instead of package index: {'git': 'https://github.com/thoth-station/storages' }",
}
Note
If you use an S2I build process with advises turned on, you can bypass the
error by turning off recommendations, just set THOTH_ADVISE
to 0
in
the corresponding build config.
Disclaimer: Please, do NOT commit such changes into repositories. We always rely on versioned packages with proper release management.
Scheduling workload in the cluster¶
You can use your computer to directly talk to cluster and schedule workload
there. An example case can be scheduling syncs of solver documents present on
Ceph. To do so, you can go to user-api
repo and run Python3 interpreter
once your Python environment is set up:
$ # Go to a repo which has thoth-common and thoth-storages installed:
$ cd thoth-station/user-api
$ pipenv install --dev
$ # Log in to cluster - your credentials will be used to schedule workload:
$ oc login <cluster-url>
$ # Make sure you adjust secrets before running Python interpreter in storages environment - you can obtain them from gopass:
$ PYTHONPATH=. THOTH_MIDDLETIER_NAMESPACE=thoth-middletier-stage THOTH_INFRA_NAMESPACE=thoth-infra-stage KUBERNETES_VERIFY_TLS=0 THOTH_CEPH_SECRET_KEY="***" THOTH_CEPH_KEY_ID="***" THOTH_S3_ENDPOINT_URL=https://s3.url.redhat.com THOTH_CEPH_BUCKET_PREFIX=data THOTH_CEPH_BUCKET=thoth THOTH_DEPLOYMENT_NAME=ocp-stage pipenv run python3
After running the commands above, you should see Python interpreter’s prompt, run the following sequence of commands (you can use help built in to see more information from function documentation):
>>> from thoth.storages import SolverResultsStore
>>> solver_store = SolverResultsStore()
>>> solver_store.connect()
>>> from thoth.common import OpenShift
>>> os = OpenShift()
Failed to load in cluster configuration, fallback to a local development setup: Service host/port is not set.
TLS verification when communicating with k8s/okd master is disabled
>>> all_solver_document_ids = solver_store.get_document_listing()
>>> [os.schedule_graph_sync_solver(solver_document_id, namespace="thoth-middletier-stage") for solver_document_id in all_solver_document_ids]
Once all the adapters get imported and instantiated, you can perform scheduling of workload using the OpenShift abstraction, which will directly talk to OpenShift to schedule workload in the cluster.