Changing data during a join
Good morning all!
I am in study and I use Dataiku but I am blocking on certain point.
At this stage I made a join between two Dataset but the data of a column has been changed.
The data are sources of advice, we go from 5,975 Qualitelis, 3,862 Booking, 118 Trip to 8,838 Booking and 1,162 Tripodvisor at the end of the join.
Am I making a mistake? Thanks in advance: D
Best Answer
-
Hi, sorry for a late response. Did you try running the analysis on the whole column data and not just on a sample?
You can choose the whole data in the dropdown that currently says "Sample":
Answers
-
Hello,
If I understood the question correctly you can go to the dataset settings -> schema and redetect it from the changed data. Then you can use the schema propagation tool on the flow to apply the new schema downstream. If needed you can also change the join recipe settings if you want to join the data differently.
-
Thanks for your feedback @Andrey
! In settings I can't find redetect it from changed data -
It's the "Check now" button under the "Schema" tab
-
It notes me this: " The schema ans the data are consistent.
That unfortunately didn't solve my problem ..
-
Looks like I didn't understand the question. What did you mean by "but the data of a column has been changed".
Is it the data in one of the datasets that got changed? Did the structure of that data change (e.g. the schema got different)?
-
In my first dataset I have a column with data corresponding to: 5,975 data lines named "Qualitelis", 3,862 "Booking", 118 "Trip".
And at the exit of my join the data of this column this find to be: 8 838 "Reservation" and 1 162 "Tripodvisor".
I don't know if I was clearer
-
could you please send a screenshot of both dataset contents and also of a join recipe settings (a tab with all join conditions) to see exactly how they're being joined?
-
Here is the side of my two datasets (columns are missing because there are a lot of them).
The problems that I encounter this storuve on the column "SourceAvis"
This is my data before the join
And here just outside the join
I don't know if you have enough information about the join
I had already encountered this same problem with a Window recipe
-
Hello, it's my turn for the late response. Indeed with all the data we find the initial data!
However, given the number of lines I think my join is not correct.
Thank you very much: D