Can we leverage Dataiku for performing ETL operations?

Options
Raj
Raj Registered Posts: 3 ✭✭✭✭

Hi,

I work for a Global financial institution. Our BAs, Operations and Senior Management use multiple reporting/Visualization tools to analyze/present their data. The underlying data resides in predominantly in Oracle or Hive. We use multiple vendor ETL tools like DataStage, Informatica to process these data before reporting (Different applications use different ETL/reporting tool). Control-M kind of scheduler is used execute the ETL pipelines/batch flow. Most of our applications process huge data in their nightly batches.

Please confirm if DataIKU DSS can be considered as a replacement ETL solution for few of our applications (Reporting is not a requirement).

Best Answers

  • jereze
    jereze Alpha Tester, Dataiker Alumni Posts: 190 ✭✭✭✭✭✭✭✭
    Answer ✓
    Options

    Hi,

    Indeed, we have customers that use Dataiku DSS for their ETL work. You should be able to find out more on our website (in the product > data preparation section for example, or in white papers).

  • VinceDS
    VinceDS Dataiker, Alpha Tester, Dataiku DSS Core Designer Posts: 45 Dataiker
    Answer ✓
    Options

    Hi Raj,

    Indeed, Dataiku DSS has a very strong value proposition when it comes to building and orchestrating ETL / Data Preparation workflows, especially in hybrid environments (hadoop, SQL, cloud & on prem data...).

    • you have built-in connectors to most enterprise data sources, and the ability to develop additional ones through plugins
    • you can use visual recipes (join, group by, prepare...) to transform data as well as code recipes (python, R, Scala) and offload the computation to the underlying architecture or to containers, crating more agility, transparency and scalability
    • you have full orchestration capabilities (scheduling + monitoring) and ability to integrate with enterprise scheduling (such as Ctrl+M)

    Of course, our key differentiators with existing ETL vendors will be:

    • making data prep/data wrangling accessible to non-IT users, fostering self-service analytics and reducing the burden on IT teams
    • helping data teams go one step further and easily build ML models and AI applications on top of data processing pipelinees

    Here's an example of a client story around building ETL pipelines in DSS - https://www.dataiku.com/stories/betclic-putting-data-science-at-the-center-of-online-gambling/

    Hope this helps

Answers

Setup Info
    Tags
      Help me…