Read JSONs from a folder on a server
Hello,
on a server (other machine than mine) I have a folder, where json files gets created for every date.
I created a project on dataiku platform, with the aim to manage those files (pulish, filter, apply models).
I created a python notebook to load the files. I took a simple example where I am sure of the path:
my_dataset = dataiku.Dataset('/opt/dataiku/data/analysis-data/VIDEO_RECOM/2017/06/0100/0_994040f2066440ca8182cc687c322fd2_1.json').get_dataframe()
But I get this error:
"
Unable to fetch schema for /opt/dataiku/data/analysis-data/VIDEO_RECOM/2017/06/0100/0_994040f2066440ca8182cc687c322fd2_1.json: "/opt/dataiku/data/analysis-data/VIDEO_RECOM/2017/06/0100/0_994040f2066440ca8182cc687c322fd2_1" is not a valid file/directory name (forbidden characters or too long) "
(print(os.path.isfile('/opt/dataiku/data/analysis-data/VIDEO_RECOM/2017/06/0100/0_994040f2066440ca8182cc687c322fd2_1.json')) returns FALSE)
On the other hand, by loading the file locally I don't get any problem.
Many thanks in advance.
Cécile
Answers
-
Hi,
dataiku.Dataset is used to read an already existing dataset in DSS, not an arbitrary file. If you just want to read a file, you can use pandas.read_csv() https://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.read_csv.html
Or you can create a new dataset in DSS targeting this file.