Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a problem, where I need to detect anomalies in a daily shipment report.
The shipment report contains planned shipments from location A to location B.
shipment created date
shipment created time
—- end key fields —
I then pre processed my data, extracted date time components and created some new features, like,
price per unit
price per km
ratio of distance/quantity
I ran my Isolation forest model, with PCA enabled and disabled and my silhouette score as below-
1. PCA enabled : 0.56
2. PCA disabled : 0.13
Additionally, materials are not shipped on daily basis, that would totally depend on the demand.
My input dataset to the model has 9,000 rows, for the last 13 months of the data.
I am still thinking what new features I can create and if aggregates, lags are of any business value.
Also as I train my model, do I exclude the base features of amount, quantity and distance as I have created a new features from those.
Any help or feedback would be appreciated.