How do I ensure correct schema for this JSON dataset?

sandeep · May 2020

Hi there,

Am trying to load data with the below format in Dataiku.

JSON data.PNG

Dataiku automatically detects the following schema:

I am getting an error when I run a prepare recipe to rename one of the columns (see below). I suspect it's because the schema is not being read correctly.

You can access the data from here, would appreciate any form of help in addressing this. Thanks!

fchataigner2 · May 2020

your tar.gz file archive does not contain only json files, there are a handful *.sh files in it too, which DSS fails to read as json (obviously)

You need to pass a tar.gz containing only json files.

sandeep · June 2020

Hi @fchataigner2
,

thanks for your reply. I have removed the .sh files and you can access the dataset here.

Unfortunately, I still face the same issue when I upload this dataset onto Dataiku. Any idea why and how to resolve this?

fchataigner2 · June 2020

the json looks like it's following the bulk API of elastic search ( https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html ), so if you don't need the _id field that's present on the index lines, or can re-create the id field, then you can simply do a filter recipe on the dataset to remove all rows where index._index is not empty

How do I ensure correct schema for this JSON dataset?

Answers

Categories

Setup Info

Tags