Categorygithub.com/asserts/asserts-otel-processor/samplingprocessor

modulepackage

0.0.3

Repository: https://github.com/asserts/asserts-otel-processor.git

Documentation: pkg.go.dev

# README

Tail Sampling Processor

Status
Stability	beta
Supported pipeline types	traces
Distributions	contrib

The tail sampling processor samples traces based on a set of defined policies. All spans for a given trace MUST be received by the same collector instance for effective sampling decisions.

Please refer to config.go for the config spec.

The following configuration options are required:

policies (no default): Policies used to make a sampling decision

Multiple policies exist today and it is straight forward to add more. These include:

always_sample: Sample all traces
latency: Sample based on the duration of the trace. The duration is determined by looking at the earliest start time and latest end time, without taking into consideration what happened in between.
numeric_attribute: Sample based on number attributes (resource and record)
probabilistic: Sample a percentage of traces. Read a comparison with the Probabilistic Sampling Processor.
status_code: Sample based upon the status code (OK, ERROR or UNSET)
string_attribute: Sample based on string attributes (resource and record) value matches, both exact and regex value matches are supported
trace_state: Sample based on TraceState value matches
rate_limiting: Sample based on rate
span_count: Sample based on the minimum number of spans within a batch. If all traces within the batch have less number of spans than the threshold, the batch will not be sampled.
and: Sample based on multiple policies, creates an AND policy
composite: Sample based on a combination of above samplers, with ordering and rate allocation per sampler. Rate allocation allocates certain percentages of spans per policy order. For example if we have set max_total_spans_per_second as 100 then we can set rate_allocation as follows
1. test-composite-policy-1 = 50 % of max_total_spans_per_second = 50 spans_per_second
2. test-composite-policy-2 = 25 % of max_total_spans_per_second = 25 spans_per_second
3. To ensure remaining capacity is filled use always_sample as one of the policies

The following configuration options can also be modified:

decision_wait (default = 30s): Wait time since the first span of a trace before making a sampling decision
num_traces (default = 50000): Number of traces kept in memory
expected_new_traces_per_sec (default = 0): Expected number of new traces (helps in allocating data structures)

Each policy will result in a decision, and the processor will evaluate them to make a final decision:

When there's an "inverted not sample" decision, the trace is not sampled;
When there's a "sample" decision, the trace is sampled;
When there's a "inverted sample" decision and no "not sample" decisions, the trace is sampled;
In all other cases, the trace is NOT sampled

An "inverted" decision is the one made based on the "invert_match" attribute, such as the one from the string tag policy.

Examples:

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      [
          {
            name: test-policy-1,
            type: always_sample
          },
          {
            name: test-policy-2,
            type: latency,
            latency: {threshold_ms: 5000}
          },
          {
            name: test-policy-3,
            type: numeric_attribute,
            numeric_attribute: {key: key1, min_value: 50, max_value: 100}
          },
          {
            name: test-policy-4,
            type: probabilistic,
            probabilistic: {sampling_percentage: 10}
          },
          {
            name: test-policy-5,
            type: status_code,
            status_code: {status_codes: [ERROR, UNSET]}
          },
          {
            name: test-policy-6,
            type: string_attribute,
            string_attribute: {key: key2, values: [value1, value2]}
          },
          {
            name: test-policy-7,
            type: string_attribute,
            string_attribute: {key: key2, values: [value1, val*], enabled_regex_matching: true, cache_max_size: 10}
          },
          {
            name: test-policy-8,
            type: rate_limiting,
            rate_limiting: {spans_per_second: 35}
         },
         {
            name: test-policy-9,
            type: string_attribute,
            string_attribute: {key: http.url, values: [\/health, \/metrics], enabled_regex_matching: true, invert_match: true}
         },
         {
            name: test-policy-10,
            type: span_count,
            span_count: {min_spans: 2}
         },
         {
             name: test-policy-11,
             type: trace_state,
             trace_state: { key: key3, values: [value1, value2] }
         },
         {
            name: and-policy-1,
            type: and,
            and: {
              and_sub_policy: 
              [
                {
                  name: test-and-policy-1,
                  type: numeric_attribute,
                  numeric_attribute: { key: key1, min_value: 50, max_value: 100 }
                },
                {
                    name: test-and-policy-2,
                    type: string_attribute,
                    string_attribute: { key: key2, values: [ value1, value2 ] }
                },
              ]
            }
         },
         {
            name: composite-policy-1,
            type: composite,
            composite:
              {
                max_total_spans_per_second: 1000,
                policy_order: [test-composite-policy-1, test-composite-policy-2, test-composite-policy-3],
                composite_sub_policy:
                  [
                    {
                      name: test-composite-policy-1,
                      type: numeric_attribute,
                      numeric_attribute: {key: key1, min_value: 50, max_value: 100}
                    },
                    {
                      name: test-composite-policy-2,
                      type: string_attribute,
                      string_attribute: {key: key2, values: [value1, value2]}
                    },
                    {
                      name: test-composite-policy-3,
                      type: always_sample
                    }
                  ],
                rate_allocation:
                  [
                    {
                      policy: test-composite-policy-1,
                      percent: 50
                    },
                    {
                      policy: test-composite-policy-2,
                      percent: 25
                    }
                  ]
              }
          },
        ]

Refer to tail_sampling_config.yaml for detailed examples on using the processor.

Scaling collectors with the tail sampling processor

This processor requires all spans for a given trace to be sent to the same collector instance for the correct sampling decision to be derived. When scaling the collector, you'll then need to ensure that all spans for the same trace are reaching the same collector. You can achieve this by having two layers of collectors in your infrastructure: one with the load balancing exporter, and one with the tail sampling processor.

While it's technically possible to have one layer of collectors with two pipelines on each instance, we recommend separating the layers in order to have better failure isolation.

Probabilistic Sampling Processor compared to the Tail Sampling Processor with the Probabilistic policy

The probabilistic sampling processor and the probabilistic tail sampling processor policy work very similar: based upon a configurable sampling percentage they will sample a fixed ratio of received traces. But depending on the overall processing pipeline you should prefer using one over the other.

As a rule of thumb, if you want to add probabilistic sampling and...

...you are not using the tail sampling processor already: use the probabilistic sampling processor. Running the probabilistic sampling processor is more efficient than the tail sampling processor. The probabilistic sampling policy makes decision based upon the trace ID, so waiting until more spans have arrived will not influence its decision.

...you are already using the tail sampling processor: add the probabilistic sampling policy. You are already incurring the cost of running the tail sampling processor, adding the probabilistic policy will be negligible. Additionally, using the policy within the tail sampling processor will ensure traces that are sampled by other policies will not be dropped.

# Functions

New

New creates a Batcher that will hold numBatches in its pipeline, having a channel with batchChannelSize to receive new items.

NewAlwaysSample

NewAlwaysSample creates a policy evaluator the samples all traces.

NewAnd

No description provided by the author

NewComposite

NewComposite creates a policy evaluator that samples all subpolicies.

NewFactory

NewFactory returns a new factory for the Tail Sampling processor.

NewLatency

NewLatency creates a policy evaluator sampling traces with a duration higher than a configured threshold.

NewNumericAttributeFilter

NewNumericAttributeFilter creates a policy evaluator that samples all traces with the given attribute in the given numeric range.

NewProbabilisticSampler

NewProbabilisticSampler creates a policy evaluator that samples a percentage of traces.

NewRateLimiting

NewRateLimiting creates a policy evaluator the samples all traces.

NewSpanCount

NewSpanCount creates a policy evaluator sampling traces with more than one span per trace.

NewStatusCodeFilter

NewStatusCodeFilter creates a policy evaluator that samples all traces with a given status code.

NewStringAttributeFilter

NewStringAttributeFilter creates a policy evaluator that samples all traces with the given attribute in the given numeric range.

NewTraceStateFilter

NewTraceStateFilter creates a policy evaluator that samples all traces with the given value by the specific key in the trace_state.

SamplingProcessorMetricViews

SamplingProcessorMetricViews return the metrics views according to given telemetry level.

# Constants

AlwaysSample

AlwaysSample samples all traces, typically used for debugging.

And

AndEvaluator allows defining a AndEvaluator policy, combining the other policies in one.

Composite

CompositeEvaluator allows defining a composite policy, combining the other policies in one.

Dropped

Dropped is used when data needs to be purged before the sampling policy had a chance to evaluate it.

Error

Error is used to indicate that policy evaluation was not succeeded.

InvertNotSampled

InvertNotSampled is used on the invert match flow and indicates to not sample the data.

InvertSampled

InvertSampled is used on the invert match flow and indicates to sample the data.

Latency

Latency sample traces that are longer than a given threshold.

NotSampled

NotSampled is used to indicate that the decision was already taken to not sample the data.

NumericAttribute

NumericAttribute sample traces that have a given numeric attribute in a specified range, e.g.: attribute "http.status_code" >= 399 and <= 999.

Pending

Pending indicates that the policy was not evaluated yet.

Probabilistic

Probabilistic samples a given percentage of traces.

RateLimiting

RateLimiting allows all traces until the specified limits are satisfied.

Sampled

Sampled is used to indicate that the decision was already taken to sample the data.

SpanCount

SpanCount sample traces that are have more spans per Trace than a given threshold.

StatusCode

StatusCode sample traces that have a given status code.

StringAttribute

StringAttribute sample traces that a attribute, of type string, matching one of the listed values.

TraceState

TraceState sample traces with specified values by the given key.

Unspecified

Unspecified indicates that the status of the decision was not set yet.

# Variables

ErrInvalidBatchChannelSize

ErrInvalidBatchChannelSize occurs when an invalid batch channel size is specified.

ErrInvalidNumBatches

ErrInvalidNumBatches occurs when an invalid number of batches is specified.

# Structs

AndCfg

No description provided by the author

AndEvaluator

No description provided by the author

AndSubPolicyCfg

AndSubPolicyCfg holds the common configuration to all policies under and policy.

CompositeCfg

CompositeCfg holds the configurable settings to create a composite sampling policy evaluator.

CompositeEvaluator

CompositeEvaluator evaluator and its internal data.

CompositeSubPolicyCfg

CompositeSubPolicyCfg holds the common configuration to all policies under composite policy.

Config

Config holds the configuration for tail-based sampling.

LatencyCfg

LatencyCfg holds the configurable settings to create a latency filter sampling policy evaluator.

MonotonicClock

MonotonicClock provides monotonic real clock-based current Unix second.

NumericAttributeCfg

NumericAttributeCfg holds the configurable settings to create a numeric attribute filter sampling policy evaluator.

PolicyCfg

PolicyCfg holds the common configuration to all policies.

PolicyTicker

Implements TTicker and abstracts underlying time ticker's functionality to make usage simpler.

ProbabilisticCfg

ProbabilisticCfg holds the configurable settings to create a probabilistic sampling policy evaluator.

RateAllocationCfg

RateAllocationCfg used within composite policy.

RateLimitingCfg

RateLimitingCfg holds the configurable settings to create a rate limiting sampling policy evaluator.

SpanCountCfg

SpanCountCfg holds the configurable settings to create a Span Count filter sampling policy sampling policy evaluator.

StatusCodeCfg

StatusCodeCfg holds the configurable settings to create a status code filter sampling policy evaluator.

StringAttributeCfg

StringAttributeCfg holds the configurable settings to create a string attribute filter sampling policy evaluator.

SubPolicyEvalParams

SubPolicyEvalParams defines the evaluator and max rate for a sub-policy.

TraceData

TraceData stores the sampling related trace data.

TraceStateCfg

No description provided by the author

# Interfaces

Batcher

Batcher behaves like a pipeline of batches that has a fixed number of batches in the pipe and a new batch being built outside of the pipe.

PolicyEvaluator

PolicyEvaluator implements a tail-based sampling policy evaluator, which makes a sampling decision for a given trace when requested.

TimeProvider

TimeProvider allows to get current Unix second.

TTicker

TTicker interface allows easier testing of Ticker related functionality.

# Type aliases

Batch

Batch is the type of batches held by the Batcher.

Decision

Decision gives the status of sampling decision.

PolicyType

PolicyType indicates the type of sampling policy.