Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am trying to find an example of invoking an external API, reading in the stream and saving the output as a dataset within my DSS flow. The documentation that I have found outlines accessing datasets that are already within DSS so I must be missing where to find this information.
I currently am calling a python script outside of DSS to pull the data (example shown below). What I want to do is have the same call but call this from within DSS as part of my flow.
I'm pretty new to DSS so thank you for any help you can provide!
import requests
import csv
headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()
with open("C:/Work/Agile Reports/UserStory.csv", "w", newline="") as outputFile:
writer = csv.DictWriter(outputFile, ["Name", "Number", "Size", "Team", "Status"])
writer.writeheader()
for member in members["Assets"]:
attributes = member["Attributes"]
writer.writerow(
{ "Name": attributes["Name"]["value"] ,
"Number": attributes["Number"]["value"] ,
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"] ,
"Status": attributes["Status.Name"]["value"],
})
Hi aw30,
Welcome to the Dataiku community!
The idea would be to create a Python recipe in your Flow, where you could then create a Pandas dataframe for example, leverage the external APIs and python code to write/add data to this dataframe, and then write the resulting dataframe to your output dataset using something like write_with_schema in your Python recipe.
For example, here is an example code that utilizes the Freshdesk APIs (which is an external API), Freshdesk being a cloud-based customer service and help desk system, to pull company information from existing users/customers in the system and write this information into a dataset that can then be used in your DSS flow.
import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize
# Create Pandas dataframe
counter = 1
freshdesk_companies_final = pd.DataFrame()
tmp = True
# Pull data using Freshdesk API using companies endpoint
while tmp:
FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
FRESHDESK_KEY = "API_KEY"
r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/companies?page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
if r.content == '[]':
break
freshdesk_companies = pd.DataFrame(json_normalize(json.loads(r.content)))
freshdesk_companies_final = freshdesk_companies_final.append(freshdesk_companies)
print counter
counter += 1
# Write resulting dataframe into the output dataset
freshdesk_companies_list = dataiku.Dataset("Freshdesk_companies_list")
freshdesk_companies_list.write_with_schema(freshdesk_companies_final)
Here is another example where we do something similar but pull satisfaction ratings information instead via the Freshdesk APIs and push this into the output dataset.
import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize
# Create Pandas dataframe
counter = 1
freshdesk_satisfaction_final = pd.DataFrame()
tmp = True
# Pull data using the satisfaction_ratings API endpoint into the dataframe
while tmp:
FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
FRESHDESK_KEY = "API_KEY"
r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/surveys/satisfaction_ratings?created_since=2014-01-01T00:00:00Z&page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
if r.content == '[]':
break
freshdesk_satisfaction = pd.DataFrame(json_normalize(json.loads(r.content)))
freshdesk_satisfaction_final = freshdesk_satisfaction_final.append(freshdesk_satisfaction)
print counter
counter += 1
# Write resulting dataframe into output dataset in DSS
freshdesk_satisfaction_list = dataiku.Dataset("Freshdesk_satisfaction")
freshdesk_satisfaction_list.write_with_schema(freshdesk_satisfaction_final)
Hopefully this helps and gets you started on the right track!
Best,
Andrew
Hi @aw30 ,
To write the output of a code recipe into a DSS flow, you should use the Dataiku API to write either in a dataset or in a managed folder.
By default when creating the python recipe, DSS has automatically generated a few lines of code to let you write a DataFrame in the output dataset.
Based on your example, you can use write_schema and write_row_dict to populate your output dataset, as detailed in the code sample below.
import dataiku
import requests
headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()
people = dataiku.Dataset("people") # Assuming the id of your output dataset is "people"
people.write_schema([
{
"name": "Name",
"type": "string",
},
{
"name": "Number",
"type": "int",
},
{
"name": "Size",
"type": "string",
},
{
"name": "Team",
"type": "string",
},
{
"name": "Status",
"type": "string",
}
])
with people.get_writer() as writer:
for member in members["Assets"]:
attributes = member["Attributes"]
writer.write_row_dict({
"Name": attributes["Name"]["value"],
"Number": attributes["Number"]["value"],
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"],
"Status": attributes["Status.Name"]["value"]
})
Have a great day!
Hi aw30,
Welcome to the Dataiku community!
The idea would be to create a Python recipe in your Flow, where you could then create a Pandas dataframe for example, leverage the external APIs and python code to write/add data to this dataframe, and then write the resulting dataframe to your output dataset using something like write_with_schema in your Python recipe.
For example, here is an example code that utilizes the Freshdesk APIs (which is an external API), Freshdesk being a cloud-based customer service and help desk system, to pull company information from existing users/customers in the system and write this information into a dataset that can then be used in your DSS flow.
import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize
# Create Pandas dataframe
counter = 1
freshdesk_companies_final = pd.DataFrame()
tmp = True
# Pull data using Freshdesk API using companies endpoint
while tmp:
FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
FRESHDESK_KEY = "API_KEY"
r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/companies?page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
if r.content == '[]':
break
freshdesk_companies = pd.DataFrame(json_normalize(json.loads(r.content)))
freshdesk_companies_final = freshdesk_companies_final.append(freshdesk_companies)
print counter
counter += 1
# Write resulting dataframe into the output dataset
freshdesk_companies_list = dataiku.Dataset("Freshdesk_companies_list")
freshdesk_companies_list.write_with_schema(freshdesk_companies_final)
Here is another example where we do something similar but pull satisfaction ratings information instead via the Freshdesk APIs and push this into the output dataset.
import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize
# Create Pandas dataframe
counter = 1
freshdesk_satisfaction_final = pd.DataFrame()
tmp = True
# Pull data using the satisfaction_ratings API endpoint into the dataframe
while tmp:
FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
FRESHDESK_KEY = "API_KEY"
r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/surveys/satisfaction_ratings?created_since=2014-01-01T00:00:00Z&page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
if r.content == '[]':
break
freshdesk_satisfaction = pd.DataFrame(json_normalize(json.loads(r.content)))
freshdesk_satisfaction_final = freshdesk_satisfaction_final.append(freshdesk_satisfaction)
print counter
counter += 1
# Write resulting dataframe into output dataset in DSS
freshdesk_satisfaction_list = dataiku.Dataset("Freshdesk_satisfaction")
freshdesk_satisfaction_list.write_with_schema(freshdesk_satisfaction_final)
Hopefully this helps and gets you started on the right track!
Best,
Andrew
Hi Andrew,
This is great - thank you so much! I am waiting for the site to be put onto our whitelist but will mark this as a solution because I understand what you are doing and how it would work.
Thanks again!
Anne
Hi @aw30 ,
To write the output of a code recipe into a DSS flow, you should use the Dataiku API to write either in a dataset or in a managed folder.
By default when creating the python recipe, DSS has automatically generated a few lines of code to let you write a DataFrame in the output dataset.
Based on your example, you can use write_schema and write_row_dict to populate your output dataset, as detailed in the code sample below.
import dataiku
import requests
headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()
people = dataiku.Dataset("people") # Assuming the id of your output dataset is "people"
people.write_schema([
{
"name": "Name",
"type": "string",
},
{
"name": "Number",
"type": "int",
},
{
"name": "Size",
"type": "string",
},
{
"name": "Team",
"type": "string",
},
{
"name": "Status",
"type": "string",
}
])
with people.get_writer() as writer:
for member in members["Assets"]:
attributes = member["Attributes"]
writer.write_row_dict({
"Name": attributes["Name"]["value"],
"Number": attributes["Number"]["value"],
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"],
"Status": attributes["Status.Name"]["value"]
})
Have a great day!
Hi - thank you for your solution! I have marked it as accepted as well since it shows me how to identify the schema within it. Thanks again for your time!
Anne