Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Does a dataset need to follow a recipe immediately?

Solved!
DogaS
Level 3
Does a dataset need to follow a recipe immediately?

Hi,

I want to use the functionality of leveraging the in-database engine so that compute happens on our Redshift. The catch is that every time I use a recipe there needs to be a dataset output of it. So, assume I am reading a table from Redshift. If I want to first filter data (Prepare Recipe) and then aggregate (Group Recipe), then I need to write an intermediary table to Redshift after the Prepare step. Is there no way to do both steps (Prepare and Group) in-database without having to write an intermediary table to database?

If the answer to above question is "No, there is no way. Yes, you need to write an intermediary table", then what would be the best practice to automatically remove the intermediary tables in my data processing so that they don't occupy space in the database?

Thank you!

0 Kudos
1 Solution
LouisDHulst

Hi @DogaS ,

Dataiku does require an intermediate table if you use Prepare and Group recipes, yes.

You could use an SQL recipe to perform both in one go. If your recipe is convertable to SQL, you can view the query it generates, convert it and then add your group by in the recipe. 

If you want to avoid SQL recipes and are using Scenarios to run your flow, you can add "Clear" steps to drop the data in the tables you want.

You might also want to look into the SQL pipelines feature: https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html

View solution in original post

0 Kudos
2 Replies
LouisDHulst

Hi @DogaS ,

Dataiku does require an intermediate table if you use Prepare and Group recipes, yes.

You could use an SQL recipe to perform both in one go. If your recipe is convertable to SQL, you can view the query it generates, convert it and then add your group by in the recipe. 

If you want to avoid SQL recipes and are using Scenarios to run your flow, you can add "Clear" steps to drop the data in the tables you want.

You might also want to look into the SQL pipelines feature: https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html

0 Kudos
DogaS
Level 3
Author

Hi @LouisDHulst ,

Thank you so much for your answer! The SQL pipelines documentation seems to be the exact thing I was looking for!

0 Kudos