Can we leverage Dataiku for performing ETL operations?

Solved!
Raj
Level 2
Can we leverage Dataiku for performing ETL operations?

Hi,

I work for a Global financial institution.  Our BAs, Operations and Senior Management use multiple reporting/Visualization tools to analyze/present their data.  The underlying data  resides in predominantly in Oracle or Hive.  We use multiple vendor ETL tools like DataStage, Informatica to process these data before reporting (Different applications use different ETL/reporting tool).   Control-M kind of scheduler is used execute the ETL pipelines/batch flow.   Most of our applications process huge data in their nightly batches.

Please confirm if DataIKU DSS can be considered as a replacement ETL solution for few of our applications (Reporting is not a requirement).

 

0 Kudos
2 Solutions
jereze
Community Manager
Community Manager

Hi,

Indeed, we have customers that use Dataiku DSS for their ETL work. You should be able to find out more on our website (in the product > data preparation section for example, or in white papers).

Jeremy, Product Manager at Dataiku

View solution in original post

0 Kudos
VinceDS
Dataiker

Hi Raj, 

Indeed, Dataiku DSS has a very strong value proposition when it comes to building and orchestrating ETL / Data Preparation workflows, especially in hybrid environments (hadoop, SQL, cloud & on prem data...). 

  • you have built-in connectors to most enterprise data sources, and the ability to develop additional ones through plugins
  • you can use visual recipes (join, group by, prepare...) to transform data as well as code recipes (python, R, Scala) and offload the computation to the underlying architecture or to containers, crating more agility, transparency and scalability
  • you have full orchestration capabilities (scheduling + monitoring) and ability to integrate with enterprise scheduling (such as Ctrl+M)

Of course, our key differentiators with existing ETL vendors will be:

  • making data prep/data wrangling accessible to non-IT users, fostering self-service analytics and reducing the burden on IT teams
  • helping data teams go one step further and easily build ML models and AI applications on top of data processing pipelinees

Here's an example of a client story around building ETL pipelines in DSS - https://www.dataiku.com/stories/betclic-putting-data-science-at-the-center-of-online-gambling/

Hope this helps

View solution in original post

5 Replies
jereze
Community Manager
Community Manager

Hi,

Indeed, we have customers that use Dataiku DSS for their ETL work. You should be able to find out more on our website (in the product > data preparation section for example, or in white papers).

Jeremy, Product Manager at Dataiku
0 Kudos
Raj
Level 2
Author
Thanks for the response. Here is our key use case, please confirm if I can read all the data from an Oracle Table (~10 Million rows) and load into Hive as is via DataIKU DSS .
jereze
Community Manager
Community Manager

Yes

Jeremy, Product Manager at Dataiku
0 Kudos
Raj
Level 2
Author
That answers, Thank you!
VinceDS
Dataiker

Hi Raj, 

Indeed, Dataiku DSS has a very strong value proposition when it comes to building and orchestrating ETL / Data Preparation workflows, especially in hybrid environments (hadoop, SQL, cloud & on prem data...). 

  • you have built-in connectors to most enterprise data sources, and the ability to develop additional ones through plugins
  • you can use visual recipes (join, group by, prepare...) to transform data as well as code recipes (python, R, Scala) and offload the computation to the underlying architecture or to containers, crating more agility, transparency and scalability
  • you have full orchestration capabilities (scheduling + monitoring) and ability to integrate with enterprise scheduling (such as Ctrl+M)

Of course, our key differentiators with existing ETL vendors will be:

  • making data prep/data wrangling accessible to non-IT users, fostering self-service analytics and reducing the burden on IT teams
  • helping data teams go one step further and easily build ML models and AI applications on top of data processing pipelinees

Here's an example of a client story around building ETL pipelines in DSS - https://www.dataiku.com/stories/betclic-putting-data-science-at-the-center-of-online-gambling/

Hope this helps