Calling external API
I am trying to find an example of invoking an external API, reading in the stream and saving the output as a dataset within my DSS flow. The documentation that I have found outlines accessing datasets that are already within DSS so I must be missing where to find this information.
I currently am calling a python script outside of DSS to pull the data (example shown below). What I want to do is have the same call but call this from within DSS as part of my flow.
I'm pretty new to DSS so thank you for any help you can provide!
import requests
import csv
headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()
with open("C:/Work/Agile Reports/UserStory.csv", "w", newline="") as outputFile:
writer = csv.DictWriter(outputFile, ["Name", "Number", "Size", "Team", "Status"])
writer.writeheader()
for member in members["Assets"]:
attributes = member["Attributes"]
writer.writerow(
{ "Name": attributes["Name"]["value"] ,
"Number": attributes["Number"]["value"] ,
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"] ,
"Status": attributes["Status.Name"]["value"],
})
Best Answers
-
Hi aw30,
Welcome to the Dataiku community!
The idea would be to create a Python recipe in your Flow, where you could then create a Pandas dataframe for example, leverage the external APIs and python code to write/add data to this dataframe, and then write the resulting dataframe to your output dataset using something like write_with_schema in your Python recipe.
For example, here is an example code that utilizes the Freshdesk APIs (which is an external API), Freshdesk being a cloud-based customer service and help desk system, to pull company information from existing users/customers in the system and write this information into a dataset that can then be used in your DSS flow.
import dataiku from dataiku import Dataset, Folder import pandas as pd import json import requests import base64 from pandas.io.json import json_normalize # Create Pandas dataframe counter = 1 freshdesk_companies_final = pd.DataFrame() tmp = True # Pull data using Freshdesk API using companies endpoint while tmp: FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly FRESHDESK_KEY = "API_KEY" r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/companies?page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X")) if r.content == '[]': break freshdesk_companies = pd.DataFrame(json_normalize(json.loads(r.content))) freshdesk_companies_final = freshdesk_companies_final.append(freshdesk_companies) print counter counter += 1 # Write resulting dataframe into the output dataset freshdesk_companies_list = dataiku.Dataset("Freshdesk_companies_list") freshdesk_companies_list.write_with_schema(freshdesk_companies_final)
Here is another example where we do something similar but pull satisfaction ratings information instead via the Freshdesk APIs and push this into the output dataset.
import dataiku from dataiku import Dataset, Folder import pandas as pd import json import requests import base64 from pandas.io.json import json_normalize # Create Pandas dataframe counter = 1 freshdesk_satisfaction_final = pd.DataFrame() tmp = True # Pull data using the satisfaction_ratings API endpoint into the dataframe while tmp: FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly FRESHDESK_KEY = "API_KEY" r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/surveys/satisfaction_ratings?created_since=2014-01-01T00:00:00Z&page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X")) if r.content == '[]': break freshdesk_satisfaction = pd.DataFrame(json_normalize(json.loads(r.content))) freshdesk_satisfaction_final = freshdesk_satisfaction_final.append(freshdesk_satisfaction) print counter counter += 1 # Write resulting dataframe into output dataset in DSS freshdesk_satisfaction_list = dataiku.Dataset("Freshdesk_satisfaction") freshdesk_satisfaction_list.write_with_schema(freshdesk_satisfaction_final)
Hopefully this helps and gets you started on the right track!
Best,
Andrew
-
Hi @aw30
,To write the output of a code recipe into a DSS flow, you should use the Dataiku API to write either in a dataset or in a managed folder.
By default when creating the python recipe, DSS has automatically generated a few lines of code to let you write a DataFrame in the output dataset.
Based on your example, you can use write_schema and write_row_dict to populate your output dataset, as detailed in the code sample below.
import dataiku
import requests
headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()
people = dataiku.Dataset("people") # Assuming the id of your output dataset is "people"
people.write_schema([
{
"name": "Name",
"type": "string",
},
{
"name": "Number",
"type": "int",
},
{
"name": "Size",
"type": "string",
},
{
"name": "Team",
"type": "string",
},
{
"name": "Status",
"type": "string",
}
])
with people.get_writer() as writer:
for member in members["Assets"]:
attributes = member["Attributes"]
writer.write_row_dict({
"Name": attributes["Name"]["value"],
"Number": attributes["Number"]["value"],
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"],
"Status": attributes["Status.Name"]["value"]
})Have a great day!
Answers
-
Hi Andrew,
This is great - thank you so much! I am waiting for the site to be put onto our whitelist but will mark this as a solution because I understand what you are doing and how it would work.
Thanks again!
Anne
-
Hi - thank you for your solution! I have marked it as accepted as well since it shows me how to identify the schema within it. Thanks again for your time!
Anne