Calling external API

Options
aw30
aw30 Dataiku DSS & SQL, Registered Posts: 49 ✭✭✭✭✭

I am trying to find an example of invoking an external API, reading in the stream and saving the output as a dataset within my DSS flow. The documentation that I have found outlines accessing datasets that are already within DSS so I must be missing where to find this information.

I currently am calling a python script outside of DSS to pull the data (example shown below). What I want to do is have the same call but call this from within DSS as part of my flow.

I'm pretty new to DSS so thank you for any help you can provide!

import requests
import csv

headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()
with open("C:/Work/Agile Reports/UserStory.csv", "w", newline="") as outputFile:
writer = csv.DictWriter(outputFile, ["Name", "Number", "Size", "Team", "Status"])
writer.writeheader()
for member in members["Assets"]:
attributes = member["Attributes"]

writer.writerow(
{ "Name": attributes["Name"]["value"] ,
"Number": attributes["Number"]["value"] ,
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"] ,
"Status": attributes["Status.Name"]["value"],
})

Tagged:

Best Answers

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    edited July 17 Answer ✓
    Options

    Hi aw30,

    Welcome to the Dataiku community!

    The idea would be to create a Python recipe in your Flow, where you could then create a Pandas dataframe for example, leverage the external APIs and python code to write/add data to this dataframe, and then write the resulting dataframe to your output dataset using something like write_with_schema in your Python recipe.

    For example, here is an example code that utilizes the Freshdesk APIs (which is an external API), Freshdesk being a cloud-based customer service and help desk system, to pull company information from existing users/customers in the system and write this information into a dataset that can then be used in your DSS flow.

    import dataiku
    from dataiku import Dataset, Folder
    import pandas as pd
    import json
    import requests
    import base64
    from pandas.io.json import json_normalize
    
    # Create Pandas dataframe
    counter = 1
    freshdesk_companies_final = pd.DataFrame()
    tmp = True
    
    # Pull data using Freshdesk API using companies endpoint
    while tmp:
        FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
        FRESHDESK_KEY = "API_KEY"
        r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/companies?page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
        if r.content == '[]':
            break
        freshdesk_companies = pd.DataFrame(json_normalize(json.loads(r.content)))
        freshdesk_companies_final = freshdesk_companies_final.append(freshdesk_companies)
        print counter 
        counter += 1
    
    # Write resulting dataframe into the output dataset 
    freshdesk_companies_list = dataiku.Dataset("Freshdesk_companies_list")
    freshdesk_companies_list.write_with_schema(freshdesk_companies_final)

    Here is another example where we do something similar but pull satisfaction ratings information instead via the Freshdesk APIs and push this into the output dataset.

    import dataiku
    from dataiku import Dataset, Folder
    import pandas as pd
    import json
    import requests
    import base64
    from pandas.io.json import json_normalize
    
    # Create Pandas dataframe
    counter = 1
    freshdesk_satisfaction_final = pd.DataFrame()
    tmp = True
    
    # Pull data using the satisfaction_ratings API endpoint into the dataframe
    while tmp:
        FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
        FRESHDESK_KEY = "API_KEY"
        r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/surveys/satisfaction_ratings?created_since=2014-01-01T00:00:00Z&page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
        if r.content == '[]':
            break
        freshdesk_satisfaction = pd.DataFrame(json_normalize(json.loads(r.content)))
        freshdesk_satisfaction_final = freshdesk_satisfaction_final.append(freshdesk_satisfaction)
        print counter 
        counter += 1
    
    # Write resulting dataframe into output dataset in DSS
    freshdesk_satisfaction_list = dataiku.Dataset("Freshdesk_satisfaction")
    freshdesk_satisfaction_list.write_with_schema(freshdesk_satisfaction_final)

    Hopefully this helps and gets you started on the right track!

    Best,

    Andrew

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    edited July 17 Answer ✓
    Options

    Hi @aw30
    ,

    To write the output of a code recipe into a DSS flow, you should use the Dataiku API to write either in a dataset or in a managed folder.

    By default when creating the python recipe, DSS has automatically generated a few lines of code to let you write a DataFrame in the output dataset.

    Based on your example, you can use write_schema and write_row_dict to populate your output dataset, as detailed in the code sample below.

    import dataiku
    import requests

    headers = {"Accept": "application/json",
    "Authorization": "xxxxxx"}
    url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
    members = requests.get(url, headers=headers).json()

    people = dataiku.Dataset("people") # Assuming the id of your output dataset is "people"
    people.write_schema([
    {
    "name": "Name",
    "type": "string",
    },
    {
    "name": "Number",
    "type": "int",
    },
    {
    "name": "Size",
    "type": "string",
    },
    {
    "name": "Team",
    "type": "string",
    },
    {
    "name": "Status",
    "type": "string",
    }
    ])

    with people.get_writer() as writer:
    for member in members["Assets"]:
    attributes = member["Attributes"]
    writer.write_row_dict({
    "Name": attributes["Name"]["value"],
    "Number": attributes["Number"]["value"],
    "Size": attributes["Size"]["value"],
    "Team": attributes["Team.Name"]["value"],
    "Status": attributes["Status.Name"]["value"]
    })

    Have a great day!

Answers

  • aw30
    aw30 Dataiku DSS & SQL, Registered Posts: 49 ✭✭✭✭✭
    Options

    Hi Andrew,

    This is great - thank you so much! I am waiting for the site to be put onto our whitelist but will mark this as a solution because I understand what you are doing and how it would work.

    Thanks again!

    Anne

  • aw30
    aw30 Dataiku DSS & SQL, Registered Posts: 49 ✭✭✭✭✭
    Options

    Hi - thank you for your solution! I have marked it as accepted as well since it shows me how to identify the schema within it. Thanks again for your time!

    Anne

Setup Info
    Tags
      Help me…