Read all the pages of a JSON file

Options
sebastienH
sebastienH Registered Posts: 7 ✭✭✭✭

Hello,

I would like to read a JSON file. But with the URL I only access to the first page (the file is paginated). I would like to read ALL the pages.

With Talend I do a loop, but whith JSON I don't know. Can I get help ?

Thank you very much

Best Answers

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Answer ✓
    Options

    Hi Sebastien,

    By default, recipes will run in "overwrite" mode in DSS. In your python recipe, try navigating to the Inputs/Outputs tab and select the Append instead of overwrite option.

    Screen Shot 2021-03-04 at 3.20.17 AM.png

    This should allow the results to be appended instead of overwritten in your output dataset. I hope that this helps!

    Thanks again,

    Andrew

  • sebastienH
    sebastienH Registered Posts: 7 ✭✭✭✭
    Answer ✓
    Options

    Hi,

    I did not see that !! Thank you ! Problem solved

    Sebastien

Answers

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Options

    Hi Sebastien,

    Could you provide more details about this JSON file and where it's located? Are you using a download recipe or how are you trying to ingest this data into your Flow in DSS?

    Thanks,

    Andrew

  • sebastienH
    sebastienH Registered Posts: 7 ✭✭✭✭
    Options

    Hello,

    Thank you for your answer.

    I get the JSON by URL https://porta....

    I can only give the end of the url :

    e5e3&query_id=735&offset=0 for the first page

    e5e3&query_id=735&offset=1 for the second page

    etc. There are thousands pages.

    I tried a download recipe, but I can’t use « add another source » for each page. Too many pages.

    Is it possible to do a loop ?

    Thank you

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Options

    Hi Sebastien,

    Your best bet might be to create your own code recipe, such as python, where you could handle iterating through the different pages to read in the necessary data, create your own dataframe, and then write this dataframe as an output into a dataset in DSS.

    Best,

    Andrew

  • sebastienH
    sebastienH Registered Posts: 7 ✭✭✭✭
    Options

    Hi Andrew,

    Thank you for your answer. I will use a python recipe without input. I thought that an input was necessary, but it's not.

    Thank you,

    Best,

    Sebastien

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Options

    Hi Sebastien,

    That's correct. For code recipes, only an output is necessary (even if it's a dummy output).

    Thanks,

    Andrew

  • sebastienH
    sebastienH Registered Posts: 7 ✭✭✭✭
    Options

    Hi Andrew,

    I forgot to "accept as solution", but I may have a last question.

    Here is my code :

    for i in range(5):
    url = "https://portail.....................35&offset={0}".format(i)
    fileA= pd.read_json(url, orient='records')
    loop_issues1.write_from_dataframe(fileA)

    The problem is that It replaces everything at each loop. But I want it to add data.

    I tried to tell it "mode="a"",but it doesn't understand.

    Thank you for your help. I did'nt find nothing about write_from_dataframe on pandas.pydata.org

Setup Info
    Tags
      Help me…