Force Visual Recipes to output View instead of Table
I've checked "Virtualizable in build" in my output dataset of a simple stack recipe but it still outputs a table instead of a view. is there a way to force this into a view without having to manually override the SQL?
Operating system used: Linux
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @somepunter
,DSS is based on materializing intermediate datasets, so they can't be views. You can use SQL pipelines (https://doc.dataiku.com/dss/latest/sql/pipelines/sql_pipelines.html) in order to avoid materializing intermediate datasets.Thanks,Jordan -
I've tried to create and chain further datasets and turned virtualization and sql pipelines on. however, even the intermediate datasets don't appear as views. are there "rules" to when DSS will and won't construct them as views?
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @somepunter
,Can you please clarify what you mean by "even the intermediate datasets don't appear as views"? DSS does not support the creation of views. Virtualizing intermediate datasets is useful because it prevents DSS from writing the data of the intermediate datasets.
When you have a SQL pipeline with virtualization enabled, you can tell that an intermediate dataset has been virtualized (see image below). The dataset icon with the dotted outline is virtualized; it has not been written or built.
Keep in mind that within our documentation on SQL pipelines, there are limitations noted. The following note is highlighted. "In some cases, you can configure a dataset to be virtualizable, but DSS would still write the dataset during the execution of the SQL pipeline. This happens when there are some technical constraints on the dataset that prevent the dataset from being virtualized."
Thanks!
Jordan
-
apologies if I'm mistaken, I vaguely recall seeing the use of views when looking at the generated SQL in SQL pipelines.
if views are not used,
1. are the sql pipelines translated into CTE and nested queries instead?
2. if so, how am I then able to pick an arbitrary intermediate dataset and view it? does it rerun the entire nested CTE query up to that point in order to show it to me in the UI?
hence why I'd have thought a sql view per virtualized dataset would have made more sense.