JSON on hadoop ?

q666 Registered Posts: 11 ✭✭✭✭

This is related to my last question, i'm still not convinced that there is a full json support ...

So to recreate the problem with a simpler and valid json

echo -e "{"foo": 123, "bar": 444}\n{"foo": 111, "bar": 321}" > simple_valid_json

hdfs dfs -put simple_valid_json

and then i'm able to create a simple_valid_json_dataset dataset via DSS ... but when i want to do something with it...

mydataset = dataiku.Dataset("simple_valid_json_dataset")

df = dkuspark.get_dataframe(sqlContext, mydataset)

df.count() -> returns an exception !

Py4JJavaError: An error occurred while calling o22.count.
: java.lang.RuntimeException: Unsupported input format : json
at com.dataiku.dip.shaker.mrimpl.formats.UniversalFileInputFormat.lazyInit(UniversalFileInputFormat.java:93)
at com.dataiku.dip.shaker.mrimpl.formats.UniversalFileInputFormat.getSplits(UniversalFileInputFormat.java:10


Setup Info
      Help me…