How do I ensure correct schema for this JSON dataset?

sandeep
Level 2
How do I ensure correct schema for this JSON dataset?

Hi there,

Am trying to load data with the below format in Dataiku.

JSON data.PNG

 

Dataiku automatically detects the following schema:

Schema.PNG

I am getting an error when I run a prepare recipe to rename one of the columns (see below). I suspect it's because the schema is not being read correctly.

Error.PNG

You can access the data from here, would appreciate any form of help in addressing this. Thanks!

 

 

 

0 Kudos
3 Replies
fchataigner2
Dataiker

your tar.gz file archive does not contain only json files, there are a handful *.sh files in it too, which DSS fails to read as json (obviously)

You need to pass a tar.gz containing only json files.

sandeep
Level 2
Author

Hi @fchataigner2 ,

thanks for your reply. I have removed the .sh files and you can access the dataset here.

Unfortunately, I still face the same issue when I upload this dataset onto Dataiku. Any idea why and how to resolve this?

Dataiku_screenshot_upload_server-metrics_1.PNG

 

 

0 Kudos
fchataigner2
Dataiker

the json looks like it's following the bulk API of elastic search ( https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html ), so if you don't need the _id field that's present on the index lines, or can re-create the id field, then you can simply do a filter recipe on the dataset to remove all rows where index._index is not empty

0 Kudos