Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on July 17, 2023 12:52PM
Likes: 0
Replies: 1
I am reaching out to seek guidance and advice on utilizing Dataiku for data integration and ETL (Extract, Transform, Load) processes. As a member of this vibrant community, I am eager to learn from your experiences and expertise in working with Dataiku.
I have recently started using Dataiku as a data integration and ETL tool for my organization. While I have a basic understanding of the platform, I am looking to expand my knowledge and discover best practices from those who have already delved into its intricacies.
Specific Questions:
My general advice will be: don't do it.
It is well established fact that most ML projects require a considerable amount of data engineering. It's so well understood that a general rule was described and talked about many times:
https://www.ibm.com/cloud/blog/ibm-data-catalog-data-scientists-productivity
https://towardsdatascience.com/the-80-20-challenge-7b8bfb643947
https://www.datagym.ai/the-80-20-data-science-dilemma/
It is natural then that for Dataiku to be effective ML platform it will have to be good at data engineering. But the fact that is good at that doesn't mean to is meant to replace traditional ETL/DI tools. In fact I would argue that doing so will lead to solutions that will be far from optimal. Dataiku is extremely good a rapid prototyping data pipelines and allows both coders and non-coders to quickly develop complex solutions. But it can also become a burden if used incorrectly. Here are the reasons as to why Dataiku is not a good standalone DI/ETL solution: