Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
This is related to my last question, i'm still not convinced that there is a full json support ...
So to recreate the problem with a simpler and valid json
echo -e "{"foo": 123, "bar": 444}\n{"foo": 111, "bar": 321}" > simple_valid_json
hdfs dfs -put simple_valid_json
and then i'm able to create a simple_valid_json_dataset dataset via DSS ... but when i want to do something with it...
mydataset = dataiku.Dataset("simple_valid_json_dataset")
df = dkuspark.get_dataframe(sqlContext, mydataset)
df.count() -> returns an exception !
Py4JJavaError: An error occurred while calling o22.count.
: java.lang.RuntimeException: Unsupported input format : json
at com.dataiku.dip.shaker.mrimpl.formats.UniversalFileInputFormat.lazyInit(UniversalFileInputFormat.java:93)
at com.dataiku.dip.shaker.mrimpl.formats.UniversalFileInputFormat.getSplits(UniversalFileInputFormat.java:10