Vector Operator

A Kubernetes operator that simplifies the deployment and management of Vector observability pipelines in your Kubernetes cluster. This operator enables declarative configuration of Vector agents and data pipelines, making it easier to collect, transform, and forward observability data.

Overview

The Vector Operator provides three custom resources:

Vector: Manages the deployment of Vector agents (DaemonSet) in your cluster
VectorAggregator: Manages the deployment of Vector aggregators (Deployment) in your cluster
VectorPipeline: Defines observability data pipelines with sources, transforms, and sinks

Key features:

Declarative configuration of Vector instances
Support for both agent (per-node) and aggregator (centralized) deployment types
Pipeline management with support for multiple sources, transforms, and sinks
Kubernetes-native deployment and management
Automatic configuration updates and reconciliation

Pipeline Validation

The operator includes a robust validation system to ensure Vector configurations are valid before deployment. See Pipeline Validation for details on:

How validation works
Checking validation status
Handling validation failures
Best practices

Common Deployment Patterns

Log Collection and Forwarding:
- Deploy Vector agents (DaemonSet) to collect logs from all nodes
- Deploy VectorAggregator instances (Deployment) to receive and process logs centrally
- Configure agents to forward to the aggregators
High Availability Aggregation:
- Deploy multiple VectorAggregator replicas for redundancy
- Use load balancing for even distribution of log processing

Quick Start

Prerequisites

Kubernetes cluster v1.11.3+
kubectl v1.11.3+
go v1.21+ (for development)
docker v17.03+ (for development)

Installation

Install the operator and CRDs:

kubectl apply -f https://raw.githubusercontent.com/zcentric/vector-operator/main/dist/install.yaml

Create Vector instances:

For an agent (runs on every node):

apiVersion: vector.zcentric.com/v1alpha1
kind: Vector
metadata:
  name: vector-agent
  namespace: vector
spec:
  image: "timberio/vector:0.38.0-distroless-libc"

For an aggregator (centralized processing):

apiVersion: vector.zcentric.com/v1alpha1
kind: VectorAggregator
metadata:
  name: vector-aggregator
  namespace: vector
spec:
  image: "timberio/vector:0.38.0-distroless-libc"
  replicas: 2 # optional, defaults to 1

Define a pipeline:

apiVersion: vector.zcentric.com/v1alpha1
kind: VectorPipeline
metadata:
  name: kubernetes-logs
spec:
  vectorRef: vector-agent
  sources:
    k8s-logs:
      type: "kubernetes_logs"
  transforms:
    remap:
      type: "remap"
      inputs: ["k8s-logs"]
      source: |
        .timestamp = del(.timestamp)
        .environment = "production"
  sinks:
    console:
      type: "console"
      inputs: ["remap"]
      encoding:
        codec: "json"

Usage Examples

Vector Deployment Types

Agent Configuration (DaemonSet)

Use the Vector CRD when you need to collect logs and metrics from every node in your cluster:

apiVersion: vector.zcentric.com/v1alpha1
kind: Vector
metadata:
  name: vector-agent
spec:
  image: "timberio/vector:0.38.0-distroless-libc"
  api:
    enabled: true
    address: "0.0.0.0:8686"
  data_dir: "/vector-data"
  expire_metrics_secs: 30

Aggregator Configuration (Deployment)

Use the VectorAggregator CRD when you need centralized log processing and aggregation:

apiVersion: vector.zcentric.com/v1alpha1
kind: VectorAggregator
metadata:
  name: vector-aggregator
spec:
  image: "timberio/vector:0.38.0-distroless-libc"
  replicas: 2
  api:
    enabled: true
    address: "0.0.0.0:8686"
  data_dir: "/vector-data"
  expire_metrics_secs: 30

Pipeline with Multiple Sources and Transforms

apiVersion: vector.zcentric.com/v1alpha1
kind: VectorPipeline
metadata:
  name: multi-source-pipeline
spec:
  vectorRef: vector-agent
  sources:
    app-logs:
      type: "kubernetes_logs"
      extra_label_selector: "app=myapp"
    system-logs:
      type: "kubernetes_logs"
      extra_label_selector: "component=system"
  transforms:
    filter-errors:
      type: "filter"
      inputs: ["app-logs"]
      condition:
        type: "vrl"
        source: ".level == 'error'"
    add-metadata:
      type: "remap"
      inputs: ["system-logs"]
      source: |
        .metadata.cluster = "production"
  sinks:
    elasticsearch:
      type: "elasticsearch"
      inputs: ["filter-errors", "add-metadata"]

Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch:
```
git checkout -b feature/my-new-feature
```

Set up your development environment:

# Install dependencies
go mod download

# Install CRDs
make install

# Run the operator locally
make run

Make your changes and add tests
Run tests:
```
make test
```
Submit a pull request

Development Guidelines

Follow Go best practices and conventions
Add unit tests for new features
Update documentation as needed
Use meaningful commit messages
Run make lint before submitting PRs
All PRs are automatically tested using GitHub Actions

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# Packages

# README