Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
The information here is really quite limited, but if I had to guess, the schema of your output dataset and the one generated by the recipe no longer match so the insert fails.
It would appear that the
SELECT * FROM ...
Now returns five columns instead of the three it previously did.
You could solve this by either not using *, but naming the needed three columns or adding the extra two columns to your output dataset before the recipe is executed.
I'm having the exact same problem, but I'm completely new to Dataiku (although not SQL), so I'm having a bit of trouble understanding what you mean by explicitly naming the three columns/adding the extra two. Maybe these screenshots of the recipe might help?
I do not know if this is the problem you are having here.
Using PostgreSQL, I have run into problems where the combined length of the database, table names, and field names appear to get too long.
Making changes to shorten names has helped me resolve things in the past.
Did Tom's hint worked for you @wsheriff ?
If it didn't, is this error happening while editing a 'Join' recipe that you have created in the past? Or it happens right away when you try to run the Join recipe for the first time?
If is the first, you are editing an old Recipe, and/or the inputs schema changed, you are going to see this kind of error. What I usually do in that case is to drop any previous 'output dataset' data and clean the schema, so the next time you run the Recipe the output dataset schema is recreated again.
@Ignacio_Toledo @tgb417 It happens when I run the recipe for the first time. For this project, we're given pretty detailed step-by-step directions for accessing the datasets and creating recipes, since it's only supposed to introduce us to Dataiku. Shortening the field names didn't seem to fix it, and I'm not quite sure how to edit table names if they were already imported when I started. Is there any way I could share the project with you or go over it live on Zoom to make it more convenient for you? Posting screenshots to a forum just seems a bit time-consuming, and I'm trying to finish this project somewhat soon...
Were you given these "pretty detailed step-by-step directions for accessing the datasets" by an IT team or an instructor or someone like that?
Are other folks actually able to get to the data you are trying to work with?
If so, I would like to invite you to go to the folks who made the instructions available, or others who have successfully gotten access to this data. They are both more likely to be able to fix a problem if one exists and have more detailed specific knowledge that we might have here in the Dataiku community.
Let us know how this goes.
@tgb417 has a nice suggestion I believe! But if you would need some live meeting to go through the project, I think I could do that. Just to let you know: I do have some expertise with DSS (around 2 years using it already) but I'm not part of Dataiku, just trying to help!
@tgb417 @Ignacio_Toledo I actually got the whole project from GE through a data analytics virtual internship program (you can view it / register here: https://www.insidesherpa.com/virtual-internships), and from the emails they've sent me, there were thousands of people who registered for it, so all they have is a box to submit questions (which I did the day before coming to this forum). The page also referred me to this community if I needed help, and I was glad to see the original question was the same error on the same step that I was on. If you'd like to try it yourselves, it should only take about a half hour to get to the point I'm at, but I feel like the issue is pretty simple, so any one-on-one help would be greatly appreciated! Thanks again guys!
Thanks @tgb417 as you mentioned before any questions that can best be answered by the program's admin if you do not know the answer so just re-affirming that.
I did a little bit of looking on my own. It does appear that the PostgreSQL version of the table av_engine_data_aic is missing data and the table av_lkp_airport_codes_t does not match the provided data dictionary.
On the files system of the server being used in this case are more complete versions of the data that seem to get more of the first project done.
That said, putting in a well-worded support question explaining the problems you have found and the actions you have taken to try to resolve the issue is likely the best next step for you.
Having data problems is a somewhat normal part of a data science project and a well worded professional query about this issue is likely to put you in a good place for this internship.
Let me know how things go for you.
I did the same thing, in my test only 6 of the needed 8 table were available to the instance of DSS that GE makes available. They do provide two ways to get to the data. One from a PostgreSQL database, the other from server file system. I was able to find maybe ok version of all tables. 6 out of PostgreSQL and two from the file system. However, I'm still seeing problems.
@wsheriff , I sent you a direct message. I agree with Ignacio. Time to reach out to the folks at GE. If there are here in the states as I expect you are not likely to hear from them until Tuesday. Monday 9/7 is a holiday here in the states.
When I tried to connect to the system just now the DSS instance GE is providing appears to be down.