Virtualize a dataset enable/disable

jvandenberg
jvandenberg Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1

I wanted to bring up a feature that would be really helpful. In a dataset there is a checkbox to “virtualize” the dataset when SQL Pipelines are enabled. Found under Settings -> Advanced:

jvandenberg_0-1677179968202.jpeg

I have found this to save a lot of time with large flows so I use it quite a bit. The downside is if I need to troubleshoot anything I need to turn it off so I can see what data is in that dataset.

My request is two-fold. First, it would be great if I could easily see which datasets are “virtualized”. I can see this being another option in the list on the bottom left of the flow:

jvandenberg_1-1677179968205.jpeg

Then I could easily see which datasets need attention instead of having to go into each one of them.

Secondly, if there would be an easier way to toggle that feature on and off, that would be really handy. Going into each dataset, checking the box and clicking save is really cumbersome.

4
4 votes

New · Last Updated

Comments

  • Jus
    Jus Registered Posts: 7 ✭✭✭✭

    @jvandenberg

    I think there actually is a way to virtualize multiple datasets at once without having to toggle the 'Virtualize in build' option one-by-one.

    If you select multiple datasets in your flow (using the control key), on the right hand side in the panel you will find the option 'Allow build virtualization (for pipelines)', under the Other actions section. See picture blow.

    Capture.PNG

    I agree with your first point, though there might be workarounds. You could manually tag each dataset with your own tags to keep track of virtualized datasets, though this could become cumbersome in large flows. Perhaps it's also possible to write a macro, using the Python API to select all datasets that have the virtualization option set to True. But I haven't looked into this yet.

Setup Info
    Tags
      Help me…