-
Setting up Stages in Snowflake to work with Dataiku
In Dataiku DSS when working with Snowflake there is an option to use a stage. This apparently speeds up performance by increasing the number of different types of processes one can do inside Snowflake without having to ship data back to the DSS server for processing. Are folks using this feature? What has your experience…
-
refresh partitions in dss via API
Hi, we have added by a python api a new dataset into the project and pointing it to an existing location in HDFS where partition folders are stored. (This location is managed by another DSS instance). This kind of "import" of read only dataset works, but I did not find a way how to "refresh" the list of partitions, i.e.…
-
How to programmatically refresh input dataset partitions with Snowflake?
Hi, I’m working with a Snowflake-partitioned dataset that serves as an input in my project flow. I’d like to automate the refresh of the partition listing, which is normally done manually using the "REFRESH PARTITIONS" button in the Metrics tab. We previously managed to do this with S3 using the…
-
The recipe execution is taking long time due to handling a large volume of data in dataiku
We are experiencing long execution times for a recipe in Dataiku due to handing large datasets, while we have implemented partitioning using a filter on a specific column, it still takes 1.5-2 hours to partitioning 30M records. Is there a more efficient way to handle and process this data quickly and effectively because…
-
How to execute a recipe after an empty dataset ?
Is there any possible way of checking readyness of a dataset? I have a dataset that might be empty after a Hive query, it shouldn't be a problem but since it is (I cannot use it in a left join...) I decided to build another dataset that would contain either the result if it exists or a dummy line if it does not. All this…
-
How to execute a MS-SQL stored procedure in Dataiku
Not a question but an answer as I couln't find any relevant posts. I solved this problem using a SQLExecutor2 in a Python recipe: from dataiku import SQLExecutor2 executor = SQLExecutor2(connection="connection name") sql_str = """Execute sp_name 'param1','param2', 'param3'""" output_df = executor.query_to_df(sql_str,…
-
Shift + Scroll for horizontal scrolling
Please add the ability to press Shift + Scrollwheel for horizontal scrolling on the data exploration screen. When exploring larger datasets, it is so much easier to use this method of scrolling through the columns compared to moving your cursor to the vertical scrollbar and moving it manually. Also, most other…
-
Select Columns Outside of Join Recipe
I would like to be able to select the columns of data outside of a join recipe. A couple of examples: 1 - Usage of "unmatched rows". The column selection occurs after the join does not apply to data that isn't joined. In this case I am using both sets of data so need the option to select columns from both sets. 2 - Removal…
-
How to append dataframe in existing output dataset
Hello experts, In dataiku v12.3.0, I was trying to append dataframe using write_dataframe() in existing dataset (with same schema). But it always overwrites with last dataframe even though the dataset spec is configured like: dataset.spec_item["appendMode"] = True The dataset is classified as output so it doesn't let me…
-
Vertical Scrolling for Datasets
It would boost my productivity significantly if I could use "Shift" + "Scrollwheel" to vertically scroll. Instead of finding the small scrollbar in the bottom of the dataset each time.