How do I ensure correct schema for this JSON dataset?
Hi there,
Am trying to load data with the below format in Dataiku.
Dataiku automatically detects the following schema:
I am getting an error when I run a prepare recipe to rename one of the columns (see below). I suspect it's because the schema is not being read correctly.
You can access the data from here, would appreciate any form of help in addressing this. Thanks!
Answers
-
your tar.gz file archive does not contain only json files, there are a handful *.sh files in it too, which DSS fails to read as json (obviously)
You need to pass a tar.gz containing only json files.
-
Hi @fchataigner2
,thanks for your reply. I have removed the .sh files and you can access the dataset here.
Unfortunately, I still face the same issue when I upload this dataset onto Dataiku. Any idea why and how to resolve this?
-
the json looks like it's following the bulk API of elastic search ( https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html ), so if you don't need the _id field that's present on the index lines, or can re-create the id field, then you can simply do a filter recipe on the dataset to remove all rows where index._index is not empty