-
Using Group Recipe Without Aggregations
Hello everyone, I believe that python allows people to use the group_by() method without any aggregations; however, in dataiku, we must aggregate when we use the group recipe. In other words, I would like to group by a specific column and keep all other columns without aggregating, is that possible in any way? Note that I…
-
min between 2 dates
Hello, I'm trying to find the 1st date between 2 date fields. I was thinking of using a min formula but there may be missing values in these fields and the formula doesn't seem to work in this case. Is there another solution other than an "if then" formula ?
-
remapping connections for API services
Goodday! In the API Designer, we can define connections to use with SQL Query Endpoints. How do we remap these connections based on deployments to different API nodes? (ie. use different connection for deployments to a production API node vs. deployments to an acceptance API node) I don't see any option in the deployer UI…
-
Remove duplicate
Hi, I have gone through few of the post on the remove duplicate but none of that give the clear answer on the same. Can you pls. provide the path to showcase how can i use some column with condition if that value repeats it would stop counting the same value with entire row in the output? K.Rgds, Kalpesh
-
Filter recipe : How to avoid stop processing when there are no matched records
Hi dataiku users, I want to know how to resolve the situation in subject. I use filter recipe only for processing the exception data, and stack with main data after that, so if there are no filtered records in output dataset, no problem. but in dataiku, if there are not all data sets in stack recipe, return error and stop…
-
How to use streaming python
Hi All! I'm trying to use streaming Python with the example given in documentation: https://doc.dataiku.com/dss/latest/streaming/cpython.html#writing-to-datasets If i try to follow it , it doesn't work exactly: 1) .get_continuous_writer() expects a source-id as one of the arguments 2) if i give something like…
-
Kafka - Restart Failed Process
I get random errors on my Kafka due to GCS bucket failures and Bigquery size limits. I'm working with my teams to resolve, but I'm wanting to know if there is an easy way to restart a continuous process in the event of a failure? I thought about setting a scenario to start the process every 30 minutes or so, but I'm sure…
-
How do I create a backend APIs for the various transformations and visualizations in a flow?
I am trying to create an ML application which can display the various types of transformations that happen in a dataset, like the count of certain rows, their min, max, etc. This app will take data from the flow and display whatever information is needed for the particular run. I need a way to pass the information to this…
-
Extract underlying code of any recipe on dataiku
I have a similar question to the one posted a few years ago https://community.dataiku.com/t5/General-Discussion/Python-script-to-export-any-kind-of-recipes-into-SQL/m-p/21298 I have a flow with tons of recipes. I want to convert that into "a" code, python, SQL, pyspark... I do not care. The solution in the link works only…
-
Manage Permissions of Dataiku folder
I want to read a docx file in my dataiku folder through python recipes, but it returns permission denied. How can I change my folder access permissions? PermissionError: [Errno 13] Permission denied: '/data/dataiku/dss_data/managed_folders/TUMING/DYV5ukXU/GDMS005_001.docx'