Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hey @HarizoR !
Thank you for the answer here however this wouldn't be a proper solution to our main need (#2 bullet point).
If we considered only the EventHub/Kafka approach, it would be certainly better to create a consumption pipeline in DSS than creating (and maintaining) an external service to read from the topic and save to the final logs storage.
The reason we discarded this approach (a streaming data flow in DSS) was that it wouldn't solve the need detailed in the #2 bullet point, in terms that it would still imply a dependency in the DSS node.
The redundant EventServer is the solution that we ended up using, and solved well for our current needs (we wanted to upgrade the VM, which is now done, and next month we want to upgrade to DSS 10).
Our consultancy team in Dataiku suggested that a possible good approach, in the long run, would be to have a dedicated node for EventServer, which would totally isolate it from the other nodes, thus making any sort of upgrade not impact the audit logs.
Thank you again for engaging here! I hope this is overall a valuable thread for the community!
Cheers
Hi Daniel, thanks for this insightful feedback 🙂
Regarding question 5b., as of today we didn't get any feedback about scaling issues with the Event Server, which remains the simplest solution to implement.
You mentioned in 3. the possibility of targeting a message broker like Kafka, which would be an alternative if one day the load of your incoming messages goes beyond the capacity of the Event Server. Having to manually save the logs from a Kafka topic to your log storage would indeed be cumbersome, however since DSS 9 you can handle streaming-based items in your DSS Flow. For example, you could create a streaming endpoint based on the Kafka topic where your logs are collected, then create a Continuous Sync recipe to save the log messages to a regular Dataset.
If you are interested, we published a short blog post to summarise the streaming capabilities of DSS.
Best,
Harizo
Hey @HarizoR !
Thank you for the answer here however this wouldn't be a proper solution to our main need (#2 bullet point).
If we considered only the EventHub/Kafka approach, it would be certainly better to create a consumption pipeline in DSS than creating (and maintaining) an external service to read from the topic and save to the final logs storage.
The reason we discarded this approach (a streaming data flow in DSS) was that it wouldn't solve the need detailed in the #2 bullet point, in terms that it would still imply a dependency in the DSS node.
The redundant EventServer is the solution that we ended up using, and solved well for our current needs (we wanted to upgrade the VM, which is now done, and next month we want to upgrade to DSS 10).
Our consultancy team in Dataiku suggested that a possible good approach, in the long run, would be to have a dedicated node for EventServer, which would totally isolate it from the other nodes, thus making any sort of upgrade not impact the audit logs.
Thank you again for engaging here! I hope this is overall a valuable thread for the community!
Cheers
To recap, I believe the question lies on whether having a "continuous recipe built" within a streaming endpoint can really make sure we still can consume the audit logs while we restart the instance or simply asking will the streaming job turned "off" due to the instance restart?
Hey Tsen-Hung!
I understand that the first component to ensure that logs are not lost would be the Kafka/EventHub topic, which can be configured to hold messages for 24 hours or larger periods.
This way, if DSS was down for some time, the logs would still be saved by the messaging component (Kafka/EventHub), and once DSS is up again, the streaming project would continue from where it stopped.
@HarizoR may be more experienced to complement/correct our understanding here 🙂