Missing columns in Schema when using API Connect

Emiel
Emiel Registered, Frontrunner 2022 Finalist, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant Posts: 23 ✭✭✭✭✭
edited July 24 in Using Dataiku

Hi all,

I'm using the API connect plugin to connect to an external dataset. The plugin works like a charm, but some columns do not end up in the schema for some reason. Exploring the dataset gives me 109 columns, while the schema shows me 101 columns. The columns are also missing a format (like "string") and are unavailable once the dataset is synced or exported to python. I've seen this issue before and don't understand what's happening. Does anyone here know how to solve this?

Another weird related thing I noticed: When I set the column to string, it gives the following error: "An invalid argument has been encountered: Dataset x: column y not in schema".

I'm using Dataiku 13.0 and API connect plugin version 1.2.2.

The screenshot of the table (note the 109 columns). I updated the sample and it still gives 109 columns. Notice the missing storage type above the blue meanings.

The schema with 101 columns. Clicking "CHECK AGAIN" doesn't result in any changes.

Tagged:

Best Answer

  • Emiel
    Emiel Registered, Frontrunner 2022 Finalist, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant Posts: 23 ✭✭✭✭✭
    Answer ✓

    I contacted Dataiku support for assistance, and they were able to fix it. The trick is to use a prepare recipe, which, unlike other recipes, results in all columns from the dataset in the output. The problem was caused by missing fields in the initial lines of the API response, which appeared later in the JSON.

Answers

Setup Info
    Tags
      Help me…