JSON on hadoop ?

Options
q666
q666 Registered Posts: 11 ✭✭✭✭

This is related to my last question, i'm still not convinced that there is a full json support ...

So to recreate the problem with a simpler and valid json

echo -e "{"foo": 123, "bar": 444}\n{"foo": 111, "bar": 321}" > simple_valid_json

hdfs dfs -put simple_valid_json



and then i'm able to create a simple_valid_json_dataset dataset via DSS ... but when i want to do something with it...



mydataset = dataiku.Dataset("simple_valid_json_dataset")

df = dkuspark.get_dataframe(sqlContext, mydataset)

df.count() -> returns an exception !


Py4JJavaError: An error occurred while calling o22.count.
: java.lang.RuntimeException: Unsupported input format : json
at com.dataiku.dip.shaker.mrimpl.formats.UniversalFileInputFormat.lazyInit(UniversalFileInputFormat.java:93)
at com.dataiku.dip.shaker.mrimpl.formats.UniversalFileInputFormat.getSplits(UniversalFileInputFormat.java:10

Answers

Setup Info
    Tags
      Help me…