# README
ACS Fleet Manager
This repository started as a fork of the Fleet Manager Golang Template. Its original README is preserved in its own section below.
TODO: Clean up and make this ACS Fleet Manager specific.
Quickstart
Contributing
- Develop on top of
main
branch - Add
// TODO(create-ticket): some explanation
near your work, so we can come back and refine it - Merge your PRs to
main
branch
Context: We want to fix the e2e flow in parallel across many engineers quickly, but don't want to push untested, potentially minimal / simplified code to release
. So we will develop in main
and later on clean things up and bring them into release
.
Rough map
Here are the key directories to know about:
- docs/ Documentation
- internal/ ACS Fleet Management specific logic
- openapi/ Public, private (admin), and fleet synchronizer APIs
- pkg/ Non-ACS specific Fleet Management logic.
- Examples include authentication, error handling, database connections, and more
- templates/
- These are actually OpenShift templates for deploying jobs to OpenShift clusters
Commands
# Install the prereqs:
# Golang 1.17+
# Docker
# ocm cli: https://github.com/openshift-online/ocm-cli (brew/dnf)
# Node.js v12.20+ (brew/dnf)
make binary
# Generate the necessary secret files (empty placeholders)
make secrets/touch
# make db/teardown # If necessary, tear down the existing db:
make db/setup && make db/migrate
make db/login
# Postgresql commands:
\dt # List tables in postgresql database
select * from migrations; # Run a query to view the migrations
quit
# By default web (no TLS) at localhost:8000, metrics at localhost:8080, healthcheck (no TLS) at localhost:8083
./fleet-manager serve
# Debugging:
# I0308 13:36:58.977437 29044 leader_election_mgr.go:115] failed to acquire leader lease: failed to retrieve leader leases: failed to connect to `host=localhost user=fleet_manager database=serviceapitests`: dial error (dial tcp 127.0.0.1:5432: connect: connection refused); failed to connect to `host=localhost user=fleet_manager database=serviceapitests`: dial error (dial tcp 127.0.0.1:5432: connect: connection refused)
# => Check that the fleet-manager-db docker image is running
# docker ps --all
# docker restart fleet-manager-db
# Run some commands against the API:
# See ./docs/populating-configuration.md#interacting-with-the-fleet-manager-api
# TL;DR: Sign in to https://cloud.redhat.com, get token at https://console.redhat.com/openshift/token, login:
ocm login --token <ocm-offline-token>
# Generate a new OCM token (will expire, unlike the ocm-offline-token):
OCM_TOKEN=$(ocm token)
# Use the token in an API request, for example:
curl -H "Authorization: Bearer ${OCM_TOKEN}" http://127.0.0.1:/8000/api/dinosaurs_mgmt
# Setting up a local CRC cluster:
crc setup # Takes some time to uncompress (12 GiB?!)
# Increase CRC resources (4 CPU and 9 GiB RAM seems to be too little, never comes up)
crc config set cpus 10
crc config set memory 18432
crc start # Requires a pull secret from https://cloud.redhat.com/openshift/create/local
crc console --credentials # (Optional) Get your login credentials, use them to login, e.g.:
# CRC includes a cached OpenShift `oc` client binary, this will set up the environment to use the cached `oc` binary:
eval $(crc oc-env)
# Login as a developer to test:
oc login -u developer -p developer https://api.crc.testing:6443
# OpenShift clusters have the Operator Lifecycle Manager installed by default.
# If running with a non-OpenShift Kubernetes cluster, you'll need to install the
# OLM yourself for the ACS Operator installation to work.
# Instructions: https://sdk.operatorframework.io/docs/installation/
# TL;DR:
brew install operator-sdk # Install the operator SDK
operator-sdk olm install # Install the OLM operator to your cluster
kubectl -n olm get pods -w # Verify installation of OLM
Fleet Manager Golang Template
This project is an example fleet management service. Fleet managers govern service instances across a range of cloud provider infrastructure and regions. They are responsible for service placement, service lifecycle including blast radius aware upgrades,control of the operators handling each service instance, DNS management, infrastructure scaling and pre-flight checks such as quota entitlement, export control, terms acceptance and authorization. They also provide the public APIs of our platform for provisioning and managing service instances.
To help you while reading the code the example service implements a simple collection of dinosaurs and their provisioning, so you can immediately know when something is infrastructure or business logic. Anything that talks about dinosaurs is business logic, which you will want to replace with your our concepts. The rest is infrastructure, and you will probably want to preserve without change.
For a real service written using the same fleet management pattern see the kas-fleet-manager.
To contact the people that created this template go to zulip.
Prerequisites
- Golang 1.17+
- Docker - to create database
- ocm cli - ocm command line tool
- Node.js v12.20+ and npm
Using the template for the first time
The implementation talks about the main components of this template. To bootstrap your application, after cloning the repository.
- Replace dinosaurs placeholder with your own business entity / objects
- Implement code that have TODO comments
// TODO
Running Fleet Manager for the first time in your local environment
Please make sure you have followed all of the prerequisites above first.
-
Follow the populating configuration guide to prepare Fleet Manager with its needed configurations
-
Compile the Fleet Manager binary
make binary
-
Create and setup the Fleet Manager database
- Create and setup the database container and the initial database schemas
make db/setup && make db/migrate
- Optional - Verify tables and records are created
# Login to the database to get a SQL prompt make db/login
# List all the tables serviceapitests# \dt
# Verify that the `migrations` table contains multiple records serviceapitests# select * from migrations;
-
Start the Fleet Manager service in your local environment
./fleet-manager serve
This will start the Fleet Manager server and it will expose its API on port 8000 by default
NOTE: The service has numerous feature flags which can be used to enable/disable certain features of the service. Please see the feature flag documentation for more information.
-
Verify the local service is working
curl -H "Authorization: Bearer $(ocm token)" http://localhost:8000/api/dinosaurs_mgmt/v1/dinosaurs {"kind":"DinosaurRequestList","page":1,"size":0,"total":0,"items":[]}
NOTE: Change dinosaur to your own rest resource
NOTE: Make sure you are logged in to OCM through the CLI before running this command. Details on that can be found here
Using the Fleet Manager service
Interacting with Fleet Manager's API
See the Interacting with the Fleet Manager API subsection in the Populating Configuration documentation
Viewing the API docs
# Start Swagger UI container
make run/docs
# Launch Swagger UI and Verify from a browser: http://localhost:8082
# Remove Swagger UI conainer
make run/docs/teardown
Running additional CLI commands
In addition to starting and running a Fleet Manager server, the Fleet Manager binary provides additional commands to interact with the service (i.e. cluster creation/scaling, Dinosaur creation, Errors list, etc.) without having to use a REST API client.
To use these commands, run make binary
to create the ./fleet-manager
binary.
Then run ./fleet-manager -h
for information on the additional available
commands.
Fleet Manager Environments
The service can be run in a number of different environments. Environments are
essentially bespoke sets of configuration that the service uses to make it
function differently. Environments can be set using the OCM_ENV
environment
variable. Below are the list of known environments and their
details.
development
- Thestaging
OCM environment is used. Sentry is disabled. Debugging utilities are enabled. This should be used in local development. This is the default environment used when directly running the Fleet Manager binary and theOCM_ENV
variable has not been set.testing
- The OCM API is mocked/stubbed out, meaning network calls to OCM will fail. The auth service is mocked. This should be used for unit testing.integration
- Identical totesting
but using an emulated OCM API server to respond to OCM API calls, instead of a basic mock. This can be used for integration testing to mock OCM behaviour.production
- Debugging utilities are disabled, Sentry is enabled. environment can be ignored in most development and is only used when the service is deployed.
The OCM_ENV
environment variable should be set before running any Fleet
Manager binary command or Makefile target
Running the fleet manager with an OSD cluster form infractl
Write a Cloud provider configuration file that matches the cloud provider and region used for the cluster, see dev/config/provider-configuration-infractl-osd.yaml
for an example OSD cluster running in GCP. See the cluster creation logs in https://infra.rox.systems/cluster/YOUR_CLUSTER to locate the provider and region. See internal/dinosaur/pkg/services/cloud_providers.go
for the provider constant.
Enable a cluster configuration file for the OSD cluster, see dev/config/dataplane-cluster-configuration-infractl-osd.yaml
for an example OSD cluster running in GCP. Again, see the cluster creation logs for possibly missing required fields.
Download the kubeconfig for the cluster. Without this the fleet manager will refuse to use the cluster.
CLUSTER=... # your cluster's name
infractl artifacts $CLUSTER --download-dir ~/infra/$CLUSTER
Launch the fleet manager using those configuration files:
make binary && ./fleet-manager serve \
--dataplane-cluster-config-file=$(pwd)/dev/config/dataplane-cluster-configuration-infractl-osd.yaml \
--providers-config-file=$(pwd)/dev/config/provider-configuration-infractl-osd.yaml \
--kubeconfig=${HOME}/infra/${CLUSTER}/kubeconfig \
2>&1 | tee fleet-manager-serve.log
Running containerized fleet-manager and fleetshard-sync
The makefile target image/build
builds a combined image, containing both applications, fleet-manager
and fleetshard-sync
.
So far only fleet-manager
can be successfully spawned from this image, because fleetshard-sync
tries to reach fleet-manager
at 127.0.0.1
(hard-coded).
Using e.g. the Docker CLI, fleet-manager
can be spawned as follows:
docker run -it --rm -p 8000:8000 \
-v "$(git rev-parse --show-toplevel)/config":/config \
-v "$(git rev-parse --show-toplevel)/secrets":/secrets \
<IMAGE REFERENCE> \
--db-host-file secrets/db.host.internal-docker \
--api-server-bindaddress 0.0.0.0:8000
Using the above command the fleet-manager
application tries to access its database running on the host system and its API server is
reachable at localhost
(host system).
In principle fleetshard-sync
will be able to spawned using a command similar to the following:
docker run -it -e OCM_TOKEN --rm -p 8000:8000 \
--entrypoint /usr/local/bin/fleetshard-sync \
-v "$(git rev-parse --show-toplevel)/config":/config \
-v "$(git rev-parse --show-toplevel)/secrets":/secrets \
<IMAGE REFERENCE>
For this to work fleetshard-sync
has to be modified so that fleet-manager
's endpoint is configurable and both containers have to be
running using a shared network so that they can access each other (TODO).
Additional documentation
- Adding new endpoint
- Adding new CLI flag
- Automated testing
- Deploying fleet manager via Service Delivery
- Requesting credentials and accounts
- Data Plane Setup
- Access Control
- Quota Management
- Running the Service on an OpenShift cluster
- Explanation of JWT token claims used across the fleet-manager
Contributing
See the contributing guide for general guidelines on how to contribute back to the template.