-
Cartesian product detection in join recipe
What's your use case? Cartesian product is a common issue when joining dataset with a bad key. It's not always easy to detect and users can even forget to check for it because they think they know their data. What's your proposed solution? What I suggest is an option to check if there will be a cartesian product on the…
-
out of range error joining 2 datasets
I have 2 datasets that I'm trying to left join using a contains. but when I try to join them, I'm getting a string index out of bounds error, does this have something to do with the sized of my data source? I noticed that when I limit me left data source to 5 rows the join works fine, but when I use the full data, I get…
-
Fuzzy Join: When to use Relative to the Left vs Right Tables.
I'm starting to work with the Fuzzy Joins and having good luck. However, I'm trying to figure out when I might want to use a Relative Threshold related to the Right or Left Table when doing a overall Left Join to find duplicate records. I understand that the proportions of items that need to match will be different based…
-
Advanced Designer Fuzzy Join Error
Hi community, Pretty new to Dataiku but not to analytics and data visualization, integration etc. But I'm having an issue in the course. I'm doing the tutorial for the Fuzzy Join and am working on our cloud instance (not on my local pc) and I went through and created the fuzzy match according to the instructions (Went back…
-
APPEND TWO TABLES
Hi All, i have two tables for eg: Table 1 : RULEBP1BP1_PK123 table 2: RULEBP2BP2_PK156 i need to append these two tables to get the output as: RULEBP1BP1_PKBP2BP2_PK123 1 45 Any help would be appreciated. Thanks in advance Operating system used: windows
-
join within a array
Hello, Is it possible to join datasets using an array or list attributein one of these datasets ? I have a dataset with a list type attribute, I want to keep only the rows whose this attribute contains certain values. these values are in a second dataset for example : dataset n°1 Name | ids AA | 12;54 BB | 22;100 CC |…
-
Fuzzy Match
i need to do fuzzy match based on jaro distance .I have two columns (X, Y). I have two unique values in the Y column.The fuzzy match need to take the shortest string from the X column and it should compare with another Y column's X values.likewise it need to do for all the X column values. If it satisfied the predefined…
-
Weird behavior of (left) join recipe with post-join computed columns losing records
I'm experiencing something unexpected with the join recipe using a left join and post-join computer columns. I'm joining 2 datasets on a single column and then computing 4 additional columns after the join. I check the number of records before and after the join. With 3 of my computed columns everything's fine and the…
-
How to combine several rows to one rows?
Hello, My data looks like this: Recordsvaluesrecords_0_NameJimmyrecords_0_Number1records_0_StatusStudentrecords_1_NamesMarierecords_1_Number2records_1_StatusWorker And i want it looks like this: NameNumberStatusJimmy1StudentMarie2Worker Any ideas?
-
is there a way to check table is not empty before running join recipe
Hi All, i'm using Hive engine, Is there a way to check the joining table is not empty before running the join recipe? scenario: Table A left Join Table B. Table A is empty. i'm getting below com.dataiku.dip.exceptions.SourceDatasetNotReadyException: Input dataset <sampleproj.tablename> is not ready my expectation is to,…