Merge two Spreadsheets using Join With

Solved!
davidhernandez
Level 3
Merge two Spreadsheets using Join With

I created my very first project.   So what I did was upload one excel spreadsheet and named it. And then uploaded another excel spreadsheet and named it.  I made sure that at least one of the columns in both spreadsheets was the same.  So Dataiku will look for the same column.  I selected "Join With" to merge both of the datasets, selected Input Datasets and Output Datasets, then clicked on Create Recipe.  I am not sure what else I need to do, because I keep receiving an error code when I try to retrieve the output file: 

"Oops: an unexpected error occurred.  Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 25...."   etc... 

0 Kudos
1 Solution
tgb417

@davidhernandez 

Congratulations on your first project.

I have been able to successfully join several datasets together in Dataiku DSS.  So, it is my belief that you should be able to successfully do joins between two spreadsheets in DSS.

With the information you provided, it would be hard for me to be of a lot of help. As far as I know, I've never seen that particular error message you are getting.  Also, there is a fair amount of context that is not available to me in that one error line.  Finally, I don't have your data or access to your DSS.

So all I can share here is an approach to maybe working through your challenge.

The first question I'd want to answer "Is the problem you are experiencing coming from your data set or is there a problem with your Dataiku Instance?"

As a test to see if there is a systemic problem with your instance of Dataiku DSS you might want to review this training material on the Join recipe

https://academy.dataiku.com/visual-recipes-overview-1/500667

And then see if the practice example here in the training materials will work for you.

https://academy.dataiku.com/basics-103/500641

If that works we will have a good idea that your instance of Dataiku DSS is working properly and the problem might be that there is something unexpected in the data. 

If it does not work, then you may have some sort of problem with the Dataiku Instance, your access rights, or some other kind of technical installation issue.  In these cases, If you have a paid-for license to DSS I would suggest that you submit a support ticket.  You can do this by:

You can request support by going the ? Mark MenuYou can request support by going the ? Mark Menu

Even if you do not have a paid-for license.  The support team is often very generous with their time. In my experience, they are still likely to get back to you with great help.  However, it is on an as-available basis and could take some time to hear back.

If all of the above is good and you can join the test dataset.  Then I'd guess that your challenge is with the MS Excel file data you are providing and possible with how to use the Visual Join Recipe.  Doing the exercise above should have given you the confidence that you can actually use the visual join recipe.  I'd go back and see if that learning will help you make your existing data work.  If not you might need to provide the community some more information about the way you are setting up the visual join, and something about the data in the join column(s).

Good Luck with this.  Let us all know how you are getting on with this. 

--Tom

View solution in original post

6 Replies
davidhernandez
Level 3
Author

Just to add, my output folder says:

"Root Path does not Exist.  Root Path of the dataset 4121_joined does not exist.  This error is typically caused by either: (1) A Dataset configuration issue.  You need to modify the affected settings to fix the issue.  (2) The Dataset needs to be rebuilt (if it is a target in the Flow)

 

0 Kudos
tgb417

@davidhernandez 

Congratulations on your first project.

I have been able to successfully join several datasets together in Dataiku DSS.  So, it is my belief that you should be able to successfully do joins between two spreadsheets in DSS.

With the information you provided, it would be hard for me to be of a lot of help. As far as I know, I've never seen that particular error message you are getting.  Also, there is a fair amount of context that is not available to me in that one error line.  Finally, I don't have your data or access to your DSS.

So all I can share here is an approach to maybe working through your challenge.

The first question I'd want to answer "Is the problem you are experiencing coming from your data set or is there a problem with your Dataiku Instance?"

As a test to see if there is a systemic problem with your instance of Dataiku DSS you might want to review this training material on the Join recipe

https://academy.dataiku.com/visual-recipes-overview-1/500667

And then see if the practice example here in the training materials will work for you.

https://academy.dataiku.com/basics-103/500641

If that works we will have a good idea that your instance of Dataiku DSS is working properly and the problem might be that there is something unexpected in the data. 

If it does not work, then you may have some sort of problem with the Dataiku Instance, your access rights, or some other kind of technical installation issue.  In these cases, If you have a paid-for license to DSS I would suggest that you submit a support ticket.  You can do this by:

You can request support by going the ? Mark MenuYou can request support by going the ? Mark Menu

Even if you do not have a paid-for license.  The support team is often very generous with their time. In my experience, they are still likely to get back to you with great help.  However, it is on an as-available basis and could take some time to hear back.

If all of the above is good and you can join the test dataset.  Then I'd guess that your challenge is with the MS Excel file data you are providing and possible with how to use the Visual Join Recipe.  Doing the exercise above should have given you the confidence that you can actually use the visual join recipe.  I'd go back and see if that learning will help you make your existing data work.  If not you might need to provide the community some more information about the way you are setting up the visual join, and something about the data in the join column(s).

Good Luck with this.  Let us all know how you are getting on with this. 

--Tom
Jurre
Level 5

Hi @davidhernandez ,

Thank you for providing the addition, that pointed me to a possible scenario&solution: 

Can it be possible that you try to join a dataset with a result-dataset instead of your second, original dataset ?  The postfix "_joined" is typically added to a name to indicate that it is the resulting dataset after the join.

Please check the source-datasets in your join-recipe as i can recreate the mentioned error-message when i try to do something similar (the attached picture shows my testdata-sets with which i reproduced your error). 

Jurre   

EDIT: added screenshot of resulting error-message

davidhernandez
Level 3
Author

Hi @Jurre 

Thank you for recreating the error, but I can confirm I am using the correct first and second datasets.  I am not using a joined dataset.  I think I received that message for the postfix "_joined" is because that output folder was not built.  So it didn't recognize the folder.  I'm not too sure.  But I escalated the error message to my company, I am hoping they come back with an answer.  I am using Spark engine, I don't know if that has anything to do with it.  I will try it again and let you know.  ๐Ÿ˜Š

davidhernandez
Level 3
Author

Hi @Jurre 

So it ended up being that I did not have access to store my files in the target file.  I had to change target file storage area to "filesystem folders" - which will still allow the export data to be CSV or Excel File (which is what I wanted).  

Jurre
Level 5

Hi @davidhernandez ,

Thanx for sharing the solution to this challenge! It will help making this forum an even bigger pile of ideas&solutions it already is.  -and i personally have something extra to think of when trying to get to the bottom of an issue like this!

All the best, Jurre

0 Kudos