-
Fuzzy Join: When to use Relative to the Left vs Right Tables.
I'm starting to work with the Fuzzy Joins and having good luck. However, I'm trying to figure out when I might want to use a Relative Threshold related to the Right or Left Table when doing a overall Left Join to find duplicate records. I understand that the proportions of items that need to match will be different based…
-
Dynamic rename and addition/removal of columns
I am connecting to QuickBase using a URL path and looking for a way to dynamically add or remove columns from the initial schema as well as rename the columns as "Simplify (and lower case)". I do not own the source data, and cannot influence the naming convention or columns at the source. I am pushing this data to GCP/Big…
-
Split and output single dataset to multiple datasets based on dynamic values
Hi There, I have a single file with multiple suppliers and want to split the file into individual files for individual suppliers. However the suppliers can be dynamic week to week (when the file is refreshed) Is there a way to split the file based on the dynamic values in the supplier column? Operating system used: Windows
-
Building multiple models in dataiku
I have master data in the flow. I want to subset the data for each market and, in that market, select submarkets and develop models. There are around 60 models that need to be built. I am wondering what would be the best approach to do this in Dataiku in terms of storage space optimization, ease of maintenance, etc. Has…
-
Read and write to samefile
Hi, I have a scenario which where we are reading the input from different file and one file among all input files is an out file too. When we do this we are loosing some data may be due to parallelism in dataIku. My request is how to stop writing to the file till the entire execution of receipes completed. I saw some posts…
-
How to selectively build multiple flow zones in a job run?
Hi Dataiku Community, I wonder how to select and build multiple flow zones in a job run. Say, I have 4 flow zones, A, B, C, and D. They are connected in such a way as follows: A -> C -> D B -> C -> D. How can I do that if I want to build the top route, i.e., A -> C -> D in a single job? Thanks! Operating system used:…
-
Multivariate LSTM in Dataiku
I've completed this tutorial on how to implement an LSTM in dataiku. https://knowledge.dataiku.com/latest/kb/analytics-ml/time-series/ts-forecast/time-series-code/deep-learning-ts.html Basically I want to understand how to adapt this for multiple variables. I have a data set that has many 'runs' of a chemical process over…
-
Create a variable using values from a table
Hi everyone, in my Dataiku flow I have a table (T-DATA-QUALITY-40) that gives me an output consisting of a row with 3 columns. This is the output of the table How can I create a new variable with the value of the second column (ANNO_CALENDARIO) ? Thanks in advance!
-
Get complete jobs history
Hi, I saw in this link :https://doc.dataiku.com/dss/latest/operations/disk-usage.html that the job logs are not garbage collected and are retained in the DSS for arbitrary time. But when I use the following code jobs = project.list_jobs() I get only the most recent 100 jobs information. Is it possible to get the complete…
-
Dataiku API
Hello everyone. I'm trying to build an application outside Dataiku to run a workflow that has been built within Dataiku, so I would like to know if it's possible to create that Web App that can be used by everyone (not necessary logged into Dataiku), so they can upload their own data and retrain the model I'm building,…