Merge two Spreadsheets using Join With

davidhernandez
davidhernandez Registered Posts: 19 ✭✭✭✭

I created my very first project. So what I did was upload one excel spreadsheet and named it. And then uploaded another excel spreadsheet and named it. I made sure that at least one of the columns in both spreadsheets was the same. So Dataiku will look for the same column. I selected "Join With" to merge both of the datasets, selected Input Datasets and Output Datasets, then clicked on Create Recipe. I am not sure what else I need to do, because I keep receiving an error code when I try to retrieve the output file:

"Oops: an unexpected error occurred. Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 25...." etc...

Best Answer

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
    Answer ✓

    @davidhernandez

    Congratulations on your first project.

    I have been able to successfully join several datasets together in Dataiku DSS. So, it is my belief that you should be able to successfully do joins between two spreadsheets in DSS.

    With the information you provided, it would be hard for me to be of a lot of help. As far as I know, I've never seen that particular error message you are getting. Also, there is a fair amount of context that is not available to me in that one error line. Finally, I don't have your data or access to your DSS.

    So all I can share here is an approach to maybe working through your challenge.

    The first question I'd want to answer "Is the problem you are experiencing coming from your data set or is there a problem with your Dataiku Instance?"

    As a test to see if there is a systemic problem with your instance of Dataiku DSS you might want to review this training material on the Join recipe

    https://academy.dataiku.com/visual-recipes-overview-1/500667

    And then see if the practice example here in the training materials will work for you.

    https://academy.dataiku.com/basics-103/500641

    If that works we will have a good idea that your instance of Dataiku DSS is working properly and the problem might be that there is something unexpected in the data.

    If it does not work, then you may have some sort of problem with the Dataiku Instance, your access rights, or some other kind of technical installation issue. In these cases, If you have a paid-for license to DSS I would suggest that you submit a support ticket. You can do this by:

    Getting DSS Support.jpg

    Even if you do not have a paid-for license. The support team is often very generous with their time. In my experience, they are still likely to get back to you with great help. However, it is on an as-available basis and could take some time to hear back.

    If all of the above is good and you can join the test dataset. Then I'd guess that your challenge is with the MS Excel file data you are providing and possible with how to use the Visual Join Recipe. Doing the exercise above should have given you the confidence that you can actually use the visual join recipe. I'd go back and see if that learning will help you make your existing data work. If not you might need to provide the community some more information about the way you are setting up the visual join, and something about the data in the join column(s).

    Good Luck with this. Let us all know how you are getting on with this.

Answers

  • davidhernandez
    davidhernandez Registered Posts: 19 ✭✭✭✭

    Just to add, my output folder says:

    "Root Path does not Exist. Root Path of the dataset 4121_joined does not exist. This error is typically caused by either: (1) A Dataset configuration issue. You need to modify the affected settings to fix the issue. (2) The Dataset needs to be rebuilt (if it is a target in the Flow)

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭

    Hi @davidhernandez
    ,

    Thank you for providing the addition, that pointed me to a possible scenario&solution:

    Can it be possible that you try to join a dataset with a result-dataset instead of your second, original dataset ? The postfix "_joined" is typically added to a name to indicate that it is the resulting dataset after the join.

    Please check the source-datasets in your join-recipe as i can recreate the mentioned error-message when i try to do something similar (the attached picture shows my testdata-sets with which i reproduced your error).

    Jurre

    EDIT: added screenshot of resulting error-message

  • davidhernandez
    davidhernandez Registered Posts: 19 ✭✭✭✭

    Hi @Jurre

    Thank you for recreating the error, but I can confirm I am using the correct first and second datasets. I am not using a joined dataset. I think I received that message for the postfix "_joined" is because that output folder was not built. So it didn't recognize the folder. I'm not too sure. But I escalated the error message to my company, I am hoping they come back with an answer. I am using Spark engine, I don't know if that has anything to do with it. I will try it again and let you know.

  • davidhernandez
    davidhernandez Registered Posts: 19 ✭✭✭✭

    Hi @Jurre

    So it ended up being that I did not have access to store my files in the target file. I had to change target file storage area to "filesystem folders" - which will still allow the export data to be CSV or Excel File (which is what I wanted).

  • Jurre
    Jurre Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered, Dataiku DSS Developer, Neuron 2022 Posts: 115 ✭✭✭✭✭✭✭

    Hi @davidhernandez
    ,

    Thanx for sharing the solution to this challenge! It will help making this forum an even bigger pile of ideas&solutions it already is. -and i personally have something extra to think of when trying to get to the bottom of an issue like this!

    All the best, Jurre

Setup Info
    Tags
      Help me…