Does dataiku support schema Evolution ?

Options
Bader
Bader Registered Posts: 46 ✭✭✭✭✭

Hello guys,

We have many cases that required adding new columns and the dataset. However, The issue of this dataset is shared across many projects and used downstream. Does dataiku support schema Evolution ? is adding new coulmns affecting visual or code recipe in downstream datasets ?

Thanks

Kind regards

Answers

  • Emma
    Emma Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 52 Dataiker
    Options

    Hey @Bader
    ,

    It sounds like you have two questions.

    1. When sharing datasets across projects does the dataset stay up to date?

    Yes, shared datasets are pointers to the dataset in the original project, not standalone copies.

    If you will often have to create new columns in a dataset (in my example screenshot it is a Prepare recipe) then you should 'Share' the output of the recipe. Therefore, whenever you use the recipe to make changes, it will update the schema of the output dataset and all of the projects that the dataset is shared with will receive the new columns/information.

    2. Are changes to the original dataset propagated downstream in a project?

    Yes, changes you make to a dataset are propagated downstream when you run downstream recipes aka "rebuild the flow". When you click RUN on a recipe or BUILD from a dataset on the Flow you can choose to do so independently or recursively (rebuild some or ALL of the datasets up until a certain point). Screenshot attached.

    For more information please check out this documentation about rebuilding datasets or this about the handling of schemas by recipes.

    Finally, if you want to automate the rebuild process then you would use Scenarios (learn about them in our Automation course).

    Hope that helps,

    Emma

  • Bader
    Bader Registered Posts: 46 ✭✭✭✭✭
    Options

    @Emma
    Many thanks for your clarification.

  • trust81
    trust81 Registered Posts: 1 ✭✭✭
    Options

    Hey @Emma

    thank you for the helpful links.

  • Katie
    Katie Dataiker, Registered, Product Ideas Manager Posts: 105 Dataiker
    Options

    Hi @Bader
    @trust81
    and all future readers of this post

    Note that we have just released improvements to make schema propagation & building behavior even more intuitive in V12! You can now run downstream & propagate schema from the "run" button within a recipe, or from the flow's "build" button.

    See more detail in our reference docs & knowledge base article.

Setup Info
    Tags
      Help me…