How do I ensure correct schema for this JSON dataset?

sandeep · ‎05-27-2020

Hi there,

Am trying to load data with the below format in Dataiku.

Dataiku automatically detects the following schema:

I am getting an error when I run a prepare recipe to rename one of the columns (see below). I suspect it's because the schema is not being read correctly.

You can access the data from here, would appreciate any form of help in addressing this. Thanks!

fchataigner2 · ‎05-27-2020

your tar.gz file archive does not contain only json files, there are a handful *.sh files in it too, which DSS fails to read as json (obviously)

You need to pass a tar.gz containing only json files.

sandeep · ‎06-09-2020

Hi @fchataigner2 ,

thanks for your reply. I have removed the .sh files and you can access the dataset here.

Unfortunately, I still face the same issue when I upload this dataset onto Dataiku. Any idea why and how to resolve this?

fchataigner2 · ‎06-10-2020

the json looks like it's following the bulk API of elastic search ( https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html ), so if you don't need the _id field that's present on the index lines, or can re-create the id field, then you can simply do a filter recipe on the dataset to remove all rows where index._index is not empty

Sign up to take part

How do I ensure correct schema for this JSON dataset?

How do I ensure correct schema for this JSON dataset?