-
" Parse next line as column headers" option not working for csv files
When uploading a csv file, and ticking the " Parse next line as column headers" option, the created dataset doesn't have the column names contained in the first row of the file.
-
Meet "Connection xxx not found" when exporting a project
Thanks for your time at the beginning. I am currently exporting a project, only check the 4 default options. However, it failed with a warning "An invalid argument has been encountered : Connection 'SF_VAW_PROD_ATP_MED' does not exist"Then I tried to use Python API to find this connection, but failed again: import dataiku…
-
Renaming a dataset using Python API
Dear Community, I am trying to rename a dataset from a project using the python API using the rename method from the dataikuapi.dss.dataset.DSSDataset class (https://developer.dataiku.com/latest/api-reference/python/datasets.html#dataikuapi.dss.dataset.DSSDataset.rename) but I get an AttributeError: 'DSSDataset' object has…
-
Exception: Unable to fetch schema for PROJECT.dataset: b'Ticket not given or unrecognized
Hi there, I encounter the sudden issue of not being able to load datasets into a Jupyter Notebook. Changing environment/Kernel doesn't help. System reboot doesn't help. Force reloading doesn't help neither. Nothing was changed in the code. Flow still runs, so it runs as a receipt but not when trying to work in the…
-
refresh partitions in dss via API
Hi, we have added by a python api a new dataset into the project and pointing it to an existing location in HDFS where partition folders are stored. (This location is managed by another DSS instance). This kind of "import" of read only dataset works, but I did not find a way how to "refresh" the list of partitions, i.e.…
-
How to programmatically refresh input dataset partitions with Snowflake?
Hi, I’m working with a Snowflake-partitioned dataset that serves as an input in my project flow. I’d like to automate the refresh of the partition listing, which is normally done manually using the "REFRESH PARTITIONS" button in the Metrics tab. We previously managed to do this with S3 using the…
-
The recipe execution is taking long time due to handling a large volume of data in dataiku
We are experiencing long execution times for a recipe in Dataiku due to handing large datasets, while we have implemented partitioning using a filter on a specific column, it still takes 1.5-2 hours to partitioning 30M records. Is there a more efficient way to handle and process this data quickly and effectively because…
-
How to execute a recipe after an empty dataset ?
Is there any possible way of checking readyness of a dataset? I have a dataset that might be empty after a Hive query, it shouldn't be a problem but since it is (I cannot use it in a left join...) I decided to build another dataset that would contain either the result if it exists or a dummy line if it does not. All this…
-
How to execute a MS-SQL stored procedure in Dataiku
Not a question but an answer as I couln't find any relevant posts. I solved this problem using a SQLExecutor2 in a Python recipe: from dataiku import SQLExecutor2 executor = SQLExecutor2(connection="connection name") sql_str = """Execute sp_name 'param1','param2', 'param3'""" output_df = executor.query_to_df(sql_str,…
-
How to append dataframe in existing output dataset
Hello experts, In dataiku v12.3.0, I was trying to append dataframe using write_dataframe() in existing dataset (with same schema). But it always overwrites with last dataframe even though the dataset spec is configured like: dataset.spec_item["appendMode"] = True The dataset is classified as output so it doesn't let me…