Python recipe run time

Options
yazidsaissi
yazidsaissi Registered Posts: 6

Hello everyone,

I want to use the Python recipe on a large dataset but it takes a lot of time to run.

There is a way to run the Python recipe in-database so it will not take a lot of time to run ?

Thank you

Answers

  • shashank
    shashank Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 27 Dataiker
    Options

    It depends on your source database where your data resides. Data Sources which support python execution can only be used for such scenarios. For E.g. Snowflake.

    It is also recommended to use Python with Spark to get the best of distributed computing while working on large datasets:

    Below are the options based on your underlying database:

    1. Snowflake: Convert your code to use Snowpark libs and you can push the compute to Snowflake. Learn More

    2. Other Database (with no native Python Support): The best way is to set up an EKS Spark Cluster in Dataiku and push your compute to that. Learn More

    Any other database which has native python support through JDBC/ODBC connection should be able to use Python recipes with it.

Setup Info
    Tags
      Help me…