Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi there,
Am trying to load data with the below format in Dataiku.
Dataiku automatically detects the following schema:
I am getting an error when I run a prepare recipe to rename one of the columns (see below). I suspect it's because the schema is not being read correctly.
You can access the data from here, would appreciate any form of help in addressing this. Thanks!
your tar.gz file archive does not contain only json files, there are a handful *.sh files in it too, which DSS fails to read as json (obviously)
You need to pass a tar.gz containing only json files.
Hi @fchataigner2 ,
thanks for your reply. I have removed the .sh files and you can access the dataset here.
Unfortunately, I still face the same issue when I upload this dataset onto Dataiku. Any idea why and how to resolve this?
the json looks like it's following the bulk API of elastic search ( https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html ), so if you don't need the _id field that's present on the index lines, or can re-create the id field, then you can simply do a filter recipe on the dataset to remove all rows where index._index is not empty