Does a dataset need to follow a recipe immediately?

DogaS Registered Posts: 16


I want to use the functionality of leveraging the in-database engine so that compute happens on our Redshift. The catch is that every time I use a recipe there needs to be a dataset output of it. So, assume I am reading a table from Redshift. If I want to first filter data (Prepare Recipe) and then aggregate (Group Recipe), then I need to write an intermediary table to Redshift after the Prepare step. Is there no way to do both steps (Prepare and Group) in-database without having to write an intermediary table to database?

If the answer to above question is "No, there is no way. Yes, you need to write an intermediary table", then what would be the best practice to automatically remove the intermediary tables in my data processing so that they don't occupy space in the database?

Thank you!

Best Answer

  • LouisDHulst
    LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 44 Neuron
    Answer ✓

    Hi @DogaS

    Dataiku does require an intermediate table if you use Prepare and Group recipes, yes.

    You could use an SQL recipe to perform both in one go. If your recipe is convertable to SQL, you can view the query it generates, convert it and then add your group by in the recipe.

    If you want to avoid SQL recipes and are using Scenarios to run your flow, you can add "Clear" steps to drop the data in the tables you want.

    You might also want to look into the SQL pipelines feature:


Setup Info
      Help me…