Categorygithub.com/oscp/openshift-monitoring

# README

General idea

We at @SchweizerischeBundesbahnen have lots of productive apps running in our OpenShift environment. So we try really hard to avoid any downtime. So we test new things (versions/config and so on) in our test environment. As our test environment runs way less pods & traffic we created this tool to check all important OpenShift components under pressure, especially during a change.

Furthermore the daemon now also has a standalone mode. It runs checks based on a http call. So you can monitor all those things from an external monitoring system.

Screenshot

Image of the UI

Components

  • UI: The UI to controll everything
  • Hub: The backend of the UI and the daemons
  • Daemon: Deploy them as DaemonSet & manually on master & nodes

Modes & Daemon Types

Modes

  • HUB = Use the hub as control instance. Hub triggers checks on daemons asynchronously
  • STANDALONE = Daemon runs on its own and exposes a webserver to run the checks

Daemon-Types

  • NODE = On a Node as systemd-service
  • MASTER = On a master as systemd-service
  • STORAGE = On glusterfs server as systemd-service
  • POD = Runs inside a docker container

Checks

Hub mode

TYPECHECK
MASTERMaster-API check
MASTERETCD health check
MASTERDNS via kubernetes
MASTERDNS via dnsmasq
MASTERHTTP check via service
MASTERHTTP check via ha-proxy
NODEMaster-API check
NODEDNS via kubernetes
NODEDNS via dnsmasq
NODEHTTP check via service
NODEHTTP check via ha-proxy
PODMaster-API check
PODDNS via kubernetes
PODDNS via Node > dnsmasq
PODSDN over http via service check
PODSDN over http via ha-proxy check

Standalone mode

TYPEURLCHECK
ALL/fastFast endpoint for http-ping
ALL/slowSlow endpoint for slow http-ping
NODE/checks/minorChecks if the dockerpool is > 80%
Checks ntpd synchronization status
Checks if http access via service is ok
NODE/checks/majorChecks if the dockerpool is > 90%
Check if dns is ok via kubernetes & dnsmasq
MASTER/checks/minorChecks ntpd synchronization status
Checks if external system is reachable
Checks if hawcular is healthy
Checks if ha-proxy has a high restart count
Checks if all projects have limits & quotas
Checks if logging pods are healthy
Checks if http access via service is ok
MASTER/checks/majorChecks if output of 'oc get nodes' is fine
Checks if etcd cluster is healthy
Checks if docker registry is healthy
Checks if all routers are healthy
Checks if local master api is healthy
Check if dns is ok via kubernetes & dnsmasq
STORAGE/checks/minorChecks if open-files count is higher than 200'000 files
Checks every lvs-pool size. Is the value above 80%?
Checks every VG has at least 10% free storage
Checks if every specified mount path has at least 15% free storage
STORAGE/checks/majorChecks if output of gstatus is 'healthy'
Checks every lvs-pool size. Is the value above 90%?
Checks every VG has at least 5% free storage
Checks if every specified mount path has at least 10% free storage

Config parameters

Hub

NAMEDESCRIPTIONEXAMPLE
UI_ADDRThe address & port where the UI should be hosted10.10.10.1:80
RPC_ADDRThe address & port where the hub should be hosted10.10.10.1:2600
MASTER_API_URLSNames or IPs of your masters with the API porthttps://master1:8443
DAEMON_PUBLIC_URLPublic url of your daemonhttp://daemon.yourdefault.route.com
ETCD_IPSNames or IPs where to call your etcd hostshttps://localhost:2379
ETCD_CERT_PATHOptional config of alternative etcd certificates path. This is used during certificate renew process of OpenShift to do checks with the old certificates. If this fails the default path will be checked as well/etc/etcd/old/

Daemon

Hub mode

NAMEDESCRIPTIONEXAMPLE
HUB_ADDRESSAddress & port of the hublocalhost:2600
DAEMON_TYPEType of the daemon out of [MASTERNODE
POD_NAMESPACEThe namespace if the daemon runs inside a pod in OpenShiftose-mon-a

Standalone mode

NAMEDAEMON TYPEDESCRIPTIONEXAMPLE
WITH_HUBALLDisable communication with hubfalse
DAEMON_TYPEALLType of the daemon out of [MASTERNODE
SERVER_ADDRESSALLThe address & port where the webserver runslocalhost:2600
POD_NAMESPACENODEThe namespace if the daemon runs inside a pod in OpenShiftose-mon-a
EXTERNAL_SYSTEM_URLMASTERURL of an external system to call via http to check external connectionwww.google.ch
HAWCULAR_SVC_IPMASTERIp of the hawcular service10.10.10.1
ETCD_IPSMASTERIps of the etcd hosts with protocol & porthttps://192.168.125.241:2379,https://192.168.125.244:2379
REGISTRY_SVC_IPMASTERIp of the registry service10.10.10.1
ROUTER_IPSMASTERIps of the routers services10.10.10.1,10.10.10.2
PROJECTS_WITHOUT_LIMITSMASTERNumber of system projects that have no limits4
PROJECTS_WITHOUT_QUOTAMASTERNumber of system projects that have no quotas4
IS_GLUSTER_SERVERSTORAGEBoolean value of the node is a gluster servertrue/false
MOUNTPOINTS_TO_CHECKA list of mount points where free size should be checked/gluster/registry/,/gluster/xxx
CHECK_CERTIFICATE_URLSA list of urls to check for validity of certificatehttps://master-ip:8443
CHECK_CERTIFICATE_PATHSA list of paths to check for validity of certificates. Filter is *.crt/etc/origin/master,/etc/origin/node

Installation

OpenShift

oc new-project ose-mon-a
oc new-project ose-mon-b
oc new-project ose-mon-c

# Join projects a <> c
oc adm pod-network join-projects --to=ose-mon-a ose-mon-c

# Use the template install/ose-mon-template.yaml
# Do this for each project a,b,c
oc project ose-mon-a

# HUB-Mode: IMAGE_SPEC = If you want to use our image use "oscp/openshift-monitoring:version"
oc process -f ose-mon-template.yaml -p DAEMON_PUBLIC_ROUTE=xxx,DS_HUB_ADDRESS=xxx,IMAGE_SPEC=xxx | oc create -f -

# Standalone-Mode:
oc process -f ose-mon-standalone-template.yaml -p DAEMON_PUBLIC_ROUTE=daemon-ose-mon-b.your-route.com IMAGE_SPEC=oscp/openshift-monitoring:xxxx | oc create -f -

Master nodes

mkdir -p /opt/ose-mon

# Download and unpack from releases or build it yourself (https://github.com/oscp/openshift-monitoring/releases)

chmod +x /opt/ose-mon/hub /opt/ose-mon/daemon

# Add your params to the service definition files
cp /opt/ose-mon/ose-mon-hub.service  /etc/systemd/system/ose-mon-hub.service
cp /opt/ose-mon/ose-mon-daemon.service  /etc/systemd/system/ose-mon-daemon.service

systemctl start ose-mon-hub.service
systemctl enable ose-mon-hub.service

systemctl start ose-mon-daemon.service
systemctl enable ose-mon-daemon.service

Install the UI

cd /opt/ose-mon
mkdir static

# The UI is included in the download above

Worker / storage nodes

  • Do the same as above, just without the hub

# Packages

No description provided by the author
No description provided by the author
No description provided by the author