Loading data from Snowflake (cloud) into Oracle on-prem database table

Options
jrmathieu63
jrmathieu63 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 26 ✭✭✭✭✭

We have data analysis that have a need to extract data from snowflake and load into an Oracle on-prem database.

They extract the data first into S3 bucket then transfer the data into Oracle.
The initial extract is quick enough at 20 mins but the transfer to Oracle is taking hours.

It appears the default transfer rate is low for Oracle and wander if there is any way to increase this?

It is not apparent on how to do this within the documentation online.

Any direction on this will be greatly appreciated.

Answers

  • Manuel
    Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
    Options

    Hi,

    Which recipes and engines are you using to transfer the data between these three data management platforms?

    In general, a recipe that reads and writes between two different connections requires the data to pass through DSS. However, in some specific cases, DSS implements a fast path, instructing the transfer directly between the two platforms. See engine section of this documentation page.

    I suspect that in your flow, you are benefitting from the fast path between Snowflake and S3, whilst the second step between S3 and Oracle is requiring the DSS engine to do the transfer. I would suggest the following:

    • Make sure your last step (onto Oracle) is done with a Sync recipe;
    • Test the other engines you may have available.

    I hope this helps, otherwise, engage your CSM to get one of our services staff to look more closely at your challenge.

  • jrmathieu63
    jrmathieu63 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 26 ✭✭✭✭✭
    Options

    Thanks for the recommendations.

    Sync recipe is what we are using to perform the transfers,

    Since the transfers are to our on-prem DB environment, the transfer is being impacted by the limited bandwidth available for all.

    We were trying to see if EKS could help by using spark but this only caused more issues since it appears our cluster's kubernetes session will just stop before the transfer ends.

    I do not see much value currently in implementing EKS since we do not any ML jobs on our platform.
    Is the EKS truly best for API deployments jobs or is it intended to be useful for other work loads?

    It would be helpful to know which work loads EKS is most useful for.

Setup Info
    Tags
      Help me…