Log analysis
Hello to all,
I'm glad to be part of the dss community. I'm a beginner in machine learning and I work on the detection of
anomalies. My dataset is logs retrieved from a SIEM and it has 7 fields (timestamp, source ip, destination ip, source port, destination port, id_community, protocol). I would like to find out abnormal time slots.
I have done the first step in dataiku which data preparation. The next step is to find the model which will help me to have abnormal times such as people who work on weekends, after time assigned to work.
I would like to know if someone can help me ?
Thanks in advance
Answers
-
Hey @4dad
,There's a couple ways to tackle this challenge, but a few out of the box methods you might want to look into are the visual models Dataiku offers and their Time series plugin. The AutoML Clustering visual model has an Anomaly Detection option that might be useful. Dataiku has a blog post here, and a summary paper here about the topic. There's also a time series forecasting plugin w/ documentation here that might be helpful.
Best of luck in your model creation journey! Hope these resources help you begin your exploration
-
Hello @kathyqingyuxu
Thank you so much for the resources. I tried to entrain the model using Kmeans by using the elbow plot to have the optimal cluster but unfortunately I got bad scoring. I'll try using the resources you gave me. I discovered Isolation Forest and I'll try auto ML if not I'll use custom python to code it.
-
Hi there,
I am stuck and I am facing an another issue. I have tried to train the model using Isolation Forest and I got a better score than the one got with Kmeans. But, I'd to improve it, that's why I want to change all field type into numerical. I don't know how to change IP address to decimal ?