Force Visual Recipes to output View instead of Table

somepunter
somepunter Registered Posts: 20 ✭✭✭

I've checked "Virtualizable in build" in my output dataset of a simple stack recipe but it still outputs a table instead of a view. is there a way to force this into a view without having to manually override the SQL?


Operating system used: Linux

Tagged:

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker

    Hi @somepunter
    ,

    DSS is based on materializing intermediate datasets, so they can't be views. You can use SQL pipelines (https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html) in order to avoid materializing intermediate datasets.
    Thanks,
    Jordan
  • somepunter
    somepunter Registered Posts: 20 ✭✭✭

    I've tried to create and chain further datasets and turned virtualization and sql pipelines on. however, even the intermediate datasets don't appear as views. are there "rules" to when DSS will and won't construct them as views?

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker

    Hi @somepunter
    ,

    Can you please clarify what you mean by "even the intermediate datasets don't appear as views"? DSS does not support the creation of views. Virtualizing intermediate datasets is useful because it prevents DSS from writing the data of the intermediate datasets.

    When you have a SQL pipeline with virtualization enabled, you can tell that an intermediate dataset has been virtualized (see image below). The dataset icon with the dotted outline is virtualized; it has not been written or built.

    Screen Shot 2022-10-12 at 4.05.15 PM.png

    Keep in mind that within our documentation on SQL pipelines, there are limitations noted. The following note is highlighted. "In some cases, you can configure a dataset to be virtualizable, but DSS would still write the dataset during the execution of the SQL pipeline. This happens when there are some technical constraints on the dataset that prevent the dataset from being virtualized."

    Thanks!

    Jordan

  • somepunter
    somepunter Registered Posts: 20 ✭✭✭

    apologies if I'm mistaken, I vaguely recall seeing the use of views when looking at the generated SQL in SQL pipelines.

    if views are not used,

    1. are the sql pipelines translated into CTE and nested queries instead?

    2. if so, how am I then able to pick an arbitrary intermediate dataset and view it? does it rerun the entire nested CTE query up to that point in order to show it to me in the UI?
    hence why I'd have thought a sql view per virtualized dataset would have made more sense.

Setup Info
    Tags
      Help me…