How can you convert a csv file to JSON file?

UserBird
Dataiker
How can you convert a csv file to JSON file?
How to convert a CSV file to JSON file using R or Python in Dataiku
6 Replies
Alex_Reutter
Dataiker Alumni
The CSV dataset in Dataiku is exposed to Python as a Pandas dataframe; I would try using the to_json() method from Pandas to convert it to JSON. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
0 Kudos
dorisli9044
Level 1
Hi Alex,

I tried it. However, it does not work in python recipe because that you have to write a dataset in the end.

Do you have examples with more details about how that works?

Thanks
0 Kudos
Alex_Reutter
Dataiker Alumni
I think I've misunderstood what you're trying to do. What's the end goal for the JSON? Do you want it to be passed as a cell of a downflow Dataiku dataset or written to an external file, or..?
0 Kudos
dorisli9044
Level 1
Hi Alex,

Ideally, I want them both. First, I want to write it to a cell in dataiku flow. Somehow in the future, I might need to be able to write it to an external file.

Thanks,
Doris
0 Kudos
Alex_Reutter
Dataiker Alumni
Cool; so, the following code could be used in a Python recipe to read a Dataiku dataset, convert it to json, write it back to a Dataiku dataset, and write it out to a file. "input_dataset" can be changed to whatever the name of the input Dataiku dataset is for the recipe, "output_dataset" can be changed to whatever the name of the output dataset is, and "output_file" can be changed to the path where you want the json to be written on the filesystem.

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Recipe inputs
input = dataiku.Dataset("input_dataset")
input_df = input.get_dataframe()

# Convert to json
input_json = input_df.to_json()

# Convert json to a one row, one column data frame
input_json_df = pd.DataFrame(data=[input_json], columns=['json'])

# Write new data frame back to Dataiku dataset
output = dataiku.Dataset("output_dataset")
output.write_with_schema(input_json_df)

# Write json to external file
f = open('output_file', 'w')
f.write(input_json)
f.close()
0 Kudos
dorisli9044
Level 1
Thanks!! It works:)
0 Kudos