package
2.205.0+incompatible
Repository: https://github.com/buildbuddy-io/buildbuddy.git
Documentation: pkg.go.dev

# README

clickhouse_backup

The clickhouse_backup tool in this package allows creating and restoring ClickHouse database backups. It can execute BACKUP or RESTORE queries against a configured ClickHouse database.

Before using the tool, ClickHouse must first be configured with a backup disk, which can either be a local disk or an S3-compatible disk. See enterprise/tools/clickhouse_cluster/gcs_backup.xml for an example.

The tool must be configured with storage.* flags to match the configured backup disk (e.g. storage.disk.* for a local disk, or storage.gcs.* for a GCS disk). Be sure to also set storage.path_prefix if backups are stored in a subdirectory.

Creating a backup

The create subcommand creates a backup. By default, it takes a full backup on the first day of the month. On other days of the month, it takes a backup based on the previous day if it exists.

Example command:

bazel run -- //enterprise/tools/clickhouse_backup \
  --olap_database.data_source=clickhouse://user:password@host:port/my_database \
  --storage.gcs.credentials="$GCS_CREDENTIALS_JSON" \
  --storage.gcs.project_id=acme-inc \
  --storage.gcs.bucket=acme-inc-clickhouse-backups \
  --storage.path_prefix=v1 \
  create \
  --database=my_database \
  --backup_disk_name=gcs_backup

Restoring a backup

The restore command restores from an existing backup. There are a few different ways to use it, depending on how you want to restore the lost data.

The simplest approach is to first drop the destination database (by separately running a manual ClickHouse query), and then restore the entire database from the most recent backup. If the database contains any new rows, they will be lost:

clickhouse-client 'DROP DATABASE my_database ON CLUSTER my_cluster'
bazel run -- //enterprise/tools/clickhouse_backup \
  --olap_database.data_source=clickhouse://user:password@host:port/my_database \
  --storage.gcs.credentials="$GCS_CREDENTIALS_JSON" \
  --storage.gcs.project_id=acme-inc \
  --storage.gcs.bucket=acme-inc-clickhouse-backups \
  --storage.path_prefix=v1 \
  restore \
  --backup_disk_name=gcs_backup \
  --backup_database=my_database \
  --destination_database=my_database \
  --all_tables

Another option is to restore a single table, which can be useful if you accidentally dropped or truncated a table. First drop the table, then run a restore command specifying a single --table to be restored:

clickhouse-client 'DROP TABLE my_database.MyTable ON CLUSTER my_cluster'
bazel run -- //enterprise/tools/clickhouse_backup \
  --olap_database.data_source=clickhouse://user:password@host:port/my_database \
  --storage.gcs.credentials="$GCS_CREDENTIALS_JSON" \
  --storage.gcs.project_id=acme-inc \
  --storage.gcs.bucket=acme-inc-clickhouse-backups \
  --storage.path_prefix=v1 \
  restore \
  --backup_disk_name=gcs_backup \
  --backup_database=my_database \
  --destination_database=my_database \
  --table=MyTable

In rare circumstances, it might make sense to append backups into an existing table, by using the --allow_non_empty_tables flag. For example, if a backup was taken 12 hours ago, and a table was accidentally deleted 5 hours ago, and the deletion went unnoticed until now, then the table contains 5 hours of useful data that may be worth keeping. So, in this case, --allow_non_empty_tables might be appropriate.