# README
stats
stats
provides a Starlark module for comprehensive statistics functions. It is a wrapper around the Go package: https://github.com/montanaflynn/stats
Functions
euclidean_distance(data1, data2) float
Calculates the straight line distance between two points in Euclidean space.
Parameters
name | type | description |
---|---|---|
data1 | list | First dataset of numerical values. |
data2 | list | Second dataset of numerical values. |
Examples
basic
Calculate Euclidean distance between two points.
load("stats", "euclidean_distance")
print(euclidean_distance([3, 4], [0, 0])) # Output: 5.0
manhattan_distance(data1, data2) float
Computes the sum of the absolute differences of their coordinates.
Parameters
name | type | description |
---|---|---|
data1 | list | First dataset of numerical values. |
data2 | list | Second dataset of numerical values. |
Examples
basic
Calculate Manhattan distance between two points.
load("stats", "manhattan_distance")
print(manhattan_distance([3, 4], [0, 0])) # Output: 7.0
softmax(data) list
Applies the Softmax function, useful for converting scores to probabilities.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Apply Softmax function to a dataset.
load("stats", "softmax")
print(softmax([1, 1, 1])) # Output: [0.3333333333333333, 0.3333333333333333, 0.3333333333333333]
sigmoid(data) list
Applies the Sigmoid function, often used for binary classification.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Apply Sigmoid function to a dataset.
load("stats", "sigmoid")
print(sigmoid([0, 2, 4])) # Output: [0.5, 0.8807970779778823, 0.9820137900379085]
mode(data) list
Determines the most frequently occurring data points in a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate mode of a dataset.
load("stats", "mode")
print(mode([1, 2, 2, 3, 4])) # Output: [2.0]
sum(data) float
Computes the sum of a series of numbers.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate sum of a dataset.
load("stats", "sum")
print(sum([1, 2, 3, 4])) # Output: 10.0
max(data) float
Finds the maximum value in a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Find maximum value in a dataset.
load("stats", "max")
print(max([1, 2, 3, 4])) # Output: 4.0
min(data) float
Finds the minimum value in a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Find minimum value in a dataset.
load("stats", "min")
print(min([1, 2, 3, 4])) # Output: 1.0
midrange(data) float
Calculates the midrange, the average of the maximum and minimum values.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate midrange of a dataset.
load("stats", "midrange")
print(midrange([1, 2, 3, 4])) # Output: 2.5
average(data) float
Alias for mean. Calculates the arithmetic mean of a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate average of a dataset.
load("stats", "average")
print(average([1, 2, 2, 3])) # Output: 2.0
mean(data) float
Calculates the arithmetic mean of a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate mean of a dataset.
load("stats", "mean")
print(mean([1, 2, 3, 4])) # Output: 2.5
geometric_mean(data) float
Computes the geometric mean, useful for datasets with exponential growth.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate geometric mean of a dataset.
load("stats", "geometric_mean")
print(geometric_mean([1, 2, 3, 4])) # Output: 2.213363839400643
harmonic_mean(data) float
Computes the harmonic mean, effective for rates and ratios.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate harmonic mean of a dataset.
load("stats", "harmonic_mean")
print(harmonic_mean([1, 2, 3, 6])) # Output: 2.0
trimean(data) float
Calculates the trimean, a measure of a dataset's tendency.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate trimean of a dataset.
load("stats", "trimean")
print(trimean([1, 2, 3, 4, 5])) # Output: 3.0
median(data) float
Finds the middle value of a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate median of a dataset.
load("stats", "median")
print(median([1, 2, 3, 4, 5])) # Output: 3.0
percentile(data, p) float
Determines the value below which a given percentage of observations in a group of observations falls.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
p | float | The percentile to compute. |
Examples
basic
Calculate 50th percentile of a dataset.
load("stats", "percentile")
print(percentile([1, 2, 3, 4, 5], 50)) # Output: 2.5
percentile_nearest_rank(data, p) float
Determines the value below which a given percentage of observations in a group of observations falls using the nearest rank method.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
p | float | The percentile to compute. |
Examples
basic
Calculate 50th percentile using nearest rank method.
load("stats", "percentile_nearest_rank")
print(percentile_nearest_rank([1, 2, 3, 4, 5], 50)) # Output: 3.0
variance(data) float
Calculates the variance of a dataset, a measure of dispersion.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate variance of a dataset.
load("stats", "variance")
print(variance([1, 2, 3, 4, 5])) # Output: 2.0
covariance(data1, data2) float
Measures how changes in one variable are associated with changes in another variable.
Parameters
name | type | description |
---|---|---|
data1 | list | First dataset of numerical values. |
data2 | list | Second dataset of numerical values. |
Examples
basic
Calculate covariance between two datasets.
load("stats", "covariance")
print(covariance([1, 2, 3], [4, 5, 6])) # Output: 1.0
covariance_population(data1, data2) float
Calculates the covariance for an entire population.
Parameters
name | type | description |
---|---|---|
data1 | list | First dataset of numerical values. |
data2 | list | Second dataset of numerical values. |
Examples
basic
Calculate population covariance between two datasets.
load("stats", "covariance_population")
print(covariance_population([1, 2, 3], [4, 5, 6])) # Output: 1.0
population_variance(data) float
Computes the variance of an entire population.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate population variance of a dataset.
load("stats", "population_variance")
print(population_variance([1, 2, 3, 4, 5])) # Output: 2.0
sample_variance(data) float
Computes the variance of a sample from the population.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate sample variance of a dataset.
load("stats", "sample_variance")
print(sample_variance([1, 2, 3, 4, 5])) # Output: 2.5
correlation(data1, data2) float
Computes the correlation coefficient between two datasets.
Parameters
name | type | description |
---|---|---|
data1 | list | First dataset of numerical values. |
data2 | list | Second dataset of numerical values. |
Examples
basic
Calculate correlation between two datasets.
load("stats", "correlation")
print(correlation([1, 2, 3], [1, 2, 3])) # Output: 1.0
pearson(data1, data2) float
Computes Pearson's correlation coefficient.
Parameters
name | type | description |
---|---|---|
data1 | list | First dataset of numerical values. |
data2 | list | Second dataset of numerical values. |
Examples
basic
Calculate Pearson's correlation coefficient.
load("stats", "pearson")
print(pearson([1, 2, 3], [1, 2, 3])) # Output: 1.0
standard_deviation(data) float
Calculates the standard deviation of a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate standard deviation of a dataset.
load("stats", "standard_deviation")
print(standard_deviation([1, 2, 3, 4, 5])) # Output: 1.4142135623730951
stddev(data) float
Alias for standard_deviation. Calculates the standard deviation of a dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate stddev of a dataset.
load("stats", "stddev")
print(stddev([1, 2, 3, 4, 5])) # Output: 1.4142135623730951
stddev_sample(data) float
Calculates the standard deviation of a sample.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
Examples
basic
Calculate standard deviation of a sample.
load("stats", "stddev_sample")
print(stddev_sample([1, 2, 3, 4, 5])) # Output: 1.5811388300841898
sample(data, take, replace=False) list
Randomly samples elements from the dataset.
Parameters
name | type | description |
---|---|---|
data | list | Dataset of numerical values. |
take | int | Number of elements to sample. |
replace | bool | Optional. If True, sampling is done with replacement. |
Examples
basic
Sample elements from a dataset.
load("stats", "sample")
print(sample(data=[1, 2, 3, 4], take=3, replace=False))