Classifies the input record into k clusters based on Euclidean distance.
- Name of the fields to be calculated, separated by a comma (
,).The field value must be numeric, and any input record whose specified field value is not numeric is ignored. Up to 100,000 input records are allowed. The command classifies the records into the N number of clusters (N starting from 1) and assigns them to the _cluster field. If there are more than 100,000 valid input records, it ignores records after 100,000.
- Number of clusters (default:
- Number of times to repeat kmeans (default:
You can test the operation method of the
kmeans command with iris data, which is often quoted in machine learning. Run the classification using length and width and compare it to the name of the actual species (download: https://github.com/illinois-cse/data-fa14/blob/gh-pages/data/iris.csv).
csvfile /opt/logpresso/iris.csv | eval sepal_length = double(sepal_length), sepal_width = double(sepal_width) | kmeans k=4 iter=100000 sepal_length, sepal_width