How can you convert a csv file to JSON file?

Options
UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
How to convert a CSV file to JSON file using R or Python in Dataiku

Answers

  • Alex_Reutter
    Alex_Reutter Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer Posts: 105 ✭✭✭✭✭✭✭
    Options
    The CSV dataset in Dataiku is exposed to Python as a Pandas dataframe; I would try using the to_json() method from Pandas to convert it to JSON. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
  • dorisli9044
    dorisli9044 Registered Posts: 4 ✭✭✭✭
    Options
    Hi Alex,

    I tried it. However, it does not work in python recipe because that you have to write a dataset in the end.

    Do you have examples with more details about how that works?

    Thanks
  • Alex_Reutter
    Alex_Reutter Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer Posts: 105 ✭✭✭✭✭✭✭
    Options
    I think I've misunderstood what you're trying to do. What's the end goal for the JSON? Do you want it to be passed as a cell of a downflow Dataiku dataset or written to an external file, or..?
  • dorisli9044
    dorisli9044 Registered Posts: 4 ✭✭✭✭
    Options
    Hi Alex,

    Ideally, I want them both. First, I want to write it to a cell in dataiku flow. Somehow in the future, I might need to be able to write it to an external file.

    Thanks,
    Doris
  • Alex_Reutter
    Alex_Reutter Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer Posts: 105 ✭✭✭✭✭✭✭
    Options
    Cool; so, the following code could be used in a Python recipe to read a Dataiku dataset, convert it to json, write it back to a Dataiku dataset, and write it out to a file. "input_dataset" can be changed to whatever the name of the input Dataiku dataset is for the recipe, "output_dataset" can be changed to whatever the name of the output dataset is, and "output_file" can be changed to the path where you want the json to be written on the filesystem.

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu

    # Recipe inputs
    input = dataiku.Dataset("input_dataset")
    input_df = input.get_dataframe()

    # Convert to json
    input_json = input_df.to_json()

    # Convert json to a one row, one column data frame
    input_json_df = pd.DataFrame(data=[input_json], columns=['json'])

    # Write new data frame back to Dataiku dataset
    output = dataiku.Dataset("output_dataset")
    output.write_with_schema(input_json_df)

    # Write json to external file
    f = open('output_file', 'w')
    f.write(input_json)
    f.close()
  • dorisli9044
    dorisli9044 Registered Posts: 4 ✭✭✭✭
    Options
    Thanks!! It works:)
Setup Info
    Tags
      Help me…