Anomaly detection | Need Expertise
I have a problem, where I need to detect anomalies in a daily shipment report.
The shipment report contains planned shipments from location A to location B.
—-key fields—-
shipment id
material id
shipment created date
shipment created time
quantity
amount
international/domestic
material type
Carrier name
business unit
source location
destination location
distance
—- end key fields —
I then pre processed my data, extracted date time components and created some new features, like,
price per unit
price per km
ratio of distance/quantity
Modeling:
I ran my Isolation forest model, with PCA enabled and disabled and my silhouette score as below-
1. PCA enabled : 0.56
2. PCA disabled : 0.13
Additionally, materials are not shipped on daily basis, that would totally depend on the demand.
My input dataset to the model has 9,000 rows, for the last 13 months of the data.
I am still thinking what new features I can create and if aggregates, lags are of any business value.
Also as I train my model, do I exclude the base features of amount, quantity and distance as I have created a new features from those.
Any help or feedback would be appreciated.