Isolation forest

seema · May 2022

I used anomaly detection from AutoML on my dataset to create a model. How to interpret metrics for isolation forest? e.g. what do the values for silhouette = .437 and inertia = 0 signify?

aggelitoo · July 2022

I would love an answer to this as well.

Googling how to generally interpret anomaly scores using isolation forest seems to indicate that values close to one are anomalous, but using the dataiku isolation forest for anomaly detection seems to return anomalies with values closer to zero. How does dataiku's isolation forest method differ from the more general approach?

//August

CoreyS · July 2022

These two metrics, silhouette and inertia, give us a notion of distance between the clusters and within the clusters. They are designed for more traditional clustering algorithms (like k-means). You can read more details about them in the links above.

In case of Isolation forest, which is used for anomaly detection, the idea behind is that an anomaly is easier to separate using random split trees than other points. So each sample is scored using this notion of number of split needed and then a threshold is used to determine whether or not it is an anomaly. We use the Isolation Forest coming from scikit, and their threshold is based on the contamination ratio (which is the expected portion of anomalies in the data).

You can thus see that for this particular algorithm, the two metrics above are not very helpful as there are not really a notion of "clusters" here. As for how to evaluate your result, as of any other use case in unsupervised learning, you will need to visualise and "manually" examine the detected anomalies using your domain knowledge to judge the quality of the predictions.

I hope that this is now clearer for you, do not hesitate to reply if you have further questions.

Isolation forest

Answers

Categories

Setup Info

Tags