Contingency for Audit Logs (EventServer, Kafka, etc.)
Best Answer
-
daniel_adornes Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 30 ✭✭✭✭✭
Hey @HarizoR
!Thank you for the answer here however this wouldn't be a proper solution to our main need (#2 bullet point).
If we considered only the EventHub/Kafka approach, it would be certainly better to create a consumption pipeline in DSS than creating (and maintaining) an external service to read from the topic and save to the final logs storage.
The reason we discarded this approach (a streaming data flow in DSS) was that it wouldn't solve the need detailed in the #2 bullet point, in terms that it would still imply a dependency in the DSS node.
The redundant EventServer is the solution that we ended up using, and solved well for our current needs (we wanted to upgrade the VM, which is now done, and next month we want to upgrade to DSS 10).
Our consultancy team in Dataiku suggested that a possible good approach, in the long run, would be to have a dedicated node for EventServer, which would totally isolate it from the other nodes, thus making any sort of upgrade not impact the audit logs.
Thank you again for engaging here! I hope this is overall a valuable thread for the community!
Cheers
Answers
-
Hi Daniel, thanks for this insightful feedback
Regarding question 5b., as of today we didn't get any feedback about scaling issues with the Event Server, which remains the simplest solution to implement.
You mentioned in 3. the possibility of targeting a message broker like Kafka, which would be an alternative if one day the load of your incoming messages goes beyond the capacity of the Event Server. Having to manually save the logs from a Kafka topic to your log storage would indeed be cumbersome, however since DSS 9 you can handle streaming-based items in your DSS Flow. For example, you could create a streaming endpoint based on the Kafka topic where your logs are collected, then create a Continuous Sync recipe to save the log messages to a regular Dataset.
If you are interested, we published a short blog post to summarise the streaming capabilities of DSS.
Best,
Harizo
-
Tsen-Hung Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 5 ✭✭✭✭
To recap, I believe the question lies on whether having a "continuous recipe built" within a streaming endpoint can really make sure we still can consume the audit logs while we restart the instance or simply asking will the streaming job turned "off" due to the instance restart?
-
daniel_adornes Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 30 ✭✭✭✭✭
Hey Tsen-Hung!
I understand that the first component to ensure that logs are not lost would be the Kafka/EventHub topic, which can be configured to hold messages for 24 hours or larger periods.
This way, if DSS was down for some time, the logs would still be saved by the messaging component (Kafka/EventHub), and once DSS is up again, the streaming project would continue from where it stopped.
@HarizoR
may be more experienced to complement/correct our understanding here