The data file called time_series.csv has two columns, one the predicated value and one the observed value. These are continuously changing sets of data (floating point). You can calculate three values from this data set:

Mean-squared error
Mean Absolute error
R-squared

In order to compute:

  go run chapter3/mean_squared_error.go chapter3/time_series.csv

Alternatively, let's say the observed and the predicted sets of data are categories. The categories in labeled.csv are 0, 1 and 2. Then you can see the number of true positives and false positives (that is, where the observed category is the same as the predicted category) as follows:

  go run chapter3/category_accuracy.go chapter3/labeled.csv

Let's say now we are testing against the '0' category:

If the predicted value and observed value are both 0, then this is a true positive (TP)
If the predicted value is 0 but the observed value is not 0, then this is a false positive (FP)
If the predicted value is not 0 but observed is 0 this is a false negative (FN)
Finally, if the predicted value is not 0 and the observed vaoue is not 0, this is a true negative (TN)

Then we can create some metrics:

Accuracy: The ratio of true predictions vs false predictions: (TP+TN)/(FP+FN+TP+TN)
Precision: The ratio of true predictions over all predictions: TP/(TP+FP)
Recall

When evaluating data, you can create training and testing sets. See how this works with the following command, which subsamples one set of data into two distinct sets:

  go run chapter3/subsample.go chapter3/time_series.csv

# Packages

# README

MachineLearning

Chapter 3