Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I want to use the Python recipe on a large dataset but it takes a lot of time to run.
There is a way to run the Python recipe in-database so it will not take a lot of time to run ?
Thank you 🙂
It depends on your source database where your data resides. Data Sources which support python execution can only be used for such scenarios. For E.g. Snowflake.
It is also recommended to use Python with Spark to get the best of distributed computing while working on large datasets:
Below are the options based on your underlying database:
1. Snowflake: Convert your code to use Snowpark libs and you can push the compute to Snowflake. Learn More
2. Other Database (with no native Python Support): The best way is to set up an EKS Spark Cluster in Dataiku and push your compute to that. Learn More
Any other database which has native python support through JDBC/ODBC connection should be able to use Python recipes with it.