insidesherpa questions

hzhan103
hzhan103 Registered Posts: 1 ✭✭✭

Anyone knows how to fix this error? Screenshot below

Answers

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    edited July 17

    Hi @hzhan103

    The information here is really quite limited, but if I had to guess, the schema of your output dataset and the one generated by the recipe no longer match so the insert fails.

    It would appear that the

    SELECT * FROM ...

    Now returns five columns instead of the three it previously did.

    You could solve this by either not using *, but naming the needed three columns or adding the extra two columns to your output dataset before the recipe is executed.

  • wsheriff
    wsheriff Registered Posts: 3 ✭✭✭✭

    I'm having the exact same problem, but I'm completely new to Dataiku (although not SQL), so I'm having a bit of trouble understanding what you mean by explicitly naming the three columns/adding the extra two. Maybe these screenshots of the recipe might help?join.pngselected_columns.png

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    I do not know if this is the problem you are having here.

    Using PostgreSQL, I have run into problems where the combined length of the database, table names, and field names appear to get too long.

    Making changes to shorten names has helped me resolve things in the past.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Did Tom's hint worked for you @wsheriff
    ?

    If it didn't, is this error happening while editing a 'Join' recipe that you have created in the past? Or it happens right away when you try to run the Join recipe for the first time?

    If is the first, you are editing an old Recipe, and/or the inputs schema changed, you are going to see this kind of error. What I usually do in that case is to drop any previous 'output dataset' data and clean the schema, so the next time you run the Recipe the output dataset schema is recreated again.

  • wsheriff
    wsheriff Registered Posts: 3 ✭✭✭✭

    @Ignacio_Toledo
    @tgb417
    It happens when I run the recipe for the first time. For this project, we're given pretty detailed step-by-step directions for accessing the datasets and creating recipes, since it's only supposed to introduce us to Dataiku. Shortening the field names didn't seem to fix it, and I'm not quite sure how to edit table names if they were already imported when I started. Is there any way I could share the project with you or go over it live on Zoom to make it more convenient for you? Posting screenshots to a forum just seems a bit time-consuming, and I'm trying to finish this project somewhat soon...

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @wsheriff

    Were you given these "pretty detailed step-by-step directions for accessing the datasets" by an IT team or an instructor or someone like that?

    Are other folks actually able to get to the data you are trying to work with?

    If so, I would like to invite you to go to the folks who made the instructions available, or others who have successfully gotten access to this data. They are both more likely to be able to fix a problem if one exists and have more detailed specific knowledge that we might have here in the Dataiku community.

    Let us know how this goes.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    @tgb417
    has a nice suggestion I believe! But if you would need some live meeting to go through the project, I think I could do that. Just to let you know: I do have some expertise with DSS (around 2 years using it already) but I'm not part of Dataiku, just trying to help!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    I'd give a shot at looking at this as well. However, I do encourage you to reach out the folks who gave you the instructions first.

  • wsheriff
    wsheriff Registered Posts: 3 ✭✭✭✭

    @tgb417
    @Ignacio_Toledo
    I actually got the whole project from GE through a data analytics virtual internship program (you can view it / register here: https://www.insidesherpa.com/virtual-internships), and from the emails they've sent me, there were thousands of people who registered for it, so all they have is a box to submit questions (which I did the day before coming to this forum). The page also referred me to this community if I needed help, and I was glad to see the original question was the same error on the same step that I was on. If you'd like to try it yourselves, it should only take about a half hour to get to the point I'm at, but I feel like the issue is pretty simple, so any one-on-one help would be greatly appreciated! Thanks again guys!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @wsheriff

    Here is a similar thread from @WisdomUdo
    who was having maybe exactly the same problem you are having with the GE Internship program setup.

    @CoreyS
    you seemed to be able to shed some light on that issue. Do you have any insights that will help in this case?

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Thanks @tgb417
    as you mentioned before any questions that can best be answered by the program's admin if you do not know the answer so just re-affirming that.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @wsheriff
    ,

    I did a little bit of looking on my own. It does appear that the PostgreSQL version of the table av_engine_data_aic is missing data and the table av_lkp_airport_codes_t does not match the provided data dictionary.

    On the files system of the server being used in this case are more complete versions of the data that seem to get more of the first project done.

    That said, putting in a well-worded support question explaining the problems you have found and the actions you have taken to try to resolve the issue is likely the best next step for you.

    Having data problems is a somewhat normal part of a data science project and a well worded professional query about this issue is likely to put you in a good place for this internship.

    Good Luck.

    Let me know how things go for you.

    --Tom

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @wsheriff
    , I believe following @tgb417
    answer is the best you can do. I was able to inspect the insidesherpa material, and there is some kind of problem that only them will be able to fix as Tom suggest.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron

    @Ignacio_Toledo
    ,

    I did the same thing, in my test only 6 of the needed 8 table were available to the instance of DSS that GE makes available. They do provide two ways to get to the data. One from a PostgreSQL database, the other from server file system. I was able to find maybe ok version of all tables. 6 out of PostgreSQL and two from the file system. However, I'm still seeing problems.

    @wsheriff
    , I sent you a direct message. I agree with Ignacio. Time to reach out to the folks at GE. If there are here in the states as I expect you are not likely to hear from them until Tuesday. Monday 9/7 is a holiday here in the states.

    When I tried to connect to the system just now the DSS instance GE is providing appears to be down.

Setup Info
    Tags
      Help me…