Calling external API

Solved!
aw30
Level 4
Calling external API

I am trying to find an example of invoking an external API, reading in the stream and saving the output as a dataset within my DSS flow. The documentation that I have found outlines accessing datasets that are already within DSS so I must be missing where to find this information. 

I currently am calling a python script outside of DSS to pull the data (example shown below). What I want to do is have the same call but call this from within DSS as part of my flow.

I'm pretty new to DSS so thank you for any help you can provide!

 

import requests
import csv

headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()
with open("C:/Work/Agile Reports/UserStory.csv", "w", newline="") as outputFile:
writer = csv.DictWriter(outputFile, ["Name", "Number", "Size", "Team", "Status"])
writer.writeheader()
for member in members["Assets"]:
attributes = member["Attributes"]

writer.writerow(
{ "Name": attributes["Name"]["value"] ,
"Number": attributes["Number"]["value"] ,
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"] ,
"Status": attributes["Status.Name"]["value"],
})

0 Kudos
2 Solutions
ATsao
Dataiker

Hi aw30,

Welcome to the Dataiku community!

The idea would be to create a Python recipe in your Flow, where you could then create a Pandas dataframe for example, leverage the external APIs and python code to write/add data to this dataframe, and then write the resulting dataframe to your output dataset using something like write_with_schema in your Python recipe. 

For example, here is an example code that utilizes the Freshdesk APIs (which is an external API), Freshdesk being a cloud-based customer service and help desk system, to pull company information from existing users/customers in the system and write this information into a dataset that can then be used in your DSS flow.

 

import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize

# Create Pandas dataframe
counter = 1
freshdesk_companies_final = pd.DataFrame()
tmp = True

# Pull data using Freshdesk API using companies endpoint
while tmp:
    FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
    FRESHDESK_KEY = "API_KEY"
    r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/companies?page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
    if r.content == '[]':
        break
    freshdesk_companies = pd.DataFrame(json_normalize(json.loads(r.content)))
    freshdesk_companies_final = freshdesk_companies_final.append(freshdesk_companies)
    print counter 
    counter += 1

# Write resulting dataframe into the output dataset 
freshdesk_companies_list = dataiku.Dataset("Freshdesk_companies_list")
freshdesk_companies_list.write_with_schema(freshdesk_companies_final)

 

 

Here is another example where we do something similar but pull satisfaction ratings information instead via the Freshdesk APIs and push this into the output dataset.

 

import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize

# Create Pandas dataframe
counter = 1
freshdesk_satisfaction_final = pd.DataFrame()
tmp = True

# Pull data using the satisfaction_ratings API endpoint into the dataframe
while tmp:
    FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
    FRESHDESK_KEY = "API_KEY"
    r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/surveys/satisfaction_ratings?created_since=2014-01-01T00:00:00Z&page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
    if r.content == '[]':
        break
    freshdesk_satisfaction = pd.DataFrame(json_normalize(json.loads(r.content)))
    freshdesk_satisfaction_final = freshdesk_satisfaction_final.append(freshdesk_satisfaction)
    print counter 
    counter += 1

# Write resulting dataframe into output dataset in DSS
freshdesk_satisfaction_list = dataiku.Dataset("Freshdesk_satisfaction")
freshdesk_satisfaction_list.write_with_schema(freshdesk_satisfaction_final)

 

 

Hopefully this helps and gets you started on the right track! 

Best,

Andrew

View solution in original post

dimitri
Dataiker

Hi @aw30 ,

To write the output of a code recipe into a DSS flow, you should use the Dataiku API to write either in a dataset or in a managed folder.

By default when creating the python recipe, DSS has automatically generated a few lines of code to let you write a DataFrame in the output dataset.

Based on your example, you can use write_schema and write_row_dict to populate your output dataset, as detailed in the code sample below.

import dataiku
import requests

headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()

people = dataiku.Dataset("people") # Assuming the id of your output dataset is "people"
people.write_schema([
{
"name": "Name",
"type": "string",
},
{
"name": "Number",
"type": "int",
},
{
"name": "Size",
"type": "string",
},
{
"name": "Team",
"type": "string",
},
{
"name": "Status",
"type": "string",
}
])

with people.get_writer() as writer:
for member in members["Assets"]:
attributes = member["Attributes"]
writer.write_row_dict({
"Name": attributes["Name"]["value"],
"Number": attributes["Number"]["value"],
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"],
"Status": attributes["Status.Name"]["value"]
})

 Have a great day!

View solution in original post

4 Replies
ATsao
Dataiker

Hi aw30,

Welcome to the Dataiku community!

The idea would be to create a Python recipe in your Flow, where you could then create a Pandas dataframe for example, leverage the external APIs and python code to write/add data to this dataframe, and then write the resulting dataframe to your output dataset using something like write_with_schema in your Python recipe. 

For example, here is an example code that utilizes the Freshdesk APIs (which is an external API), Freshdesk being a cloud-based customer service and help desk system, to pull company information from existing users/customers in the system and write this information into a dataset that can then be used in your DSS flow.

 

import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize

# Create Pandas dataframe
counter = 1
freshdesk_companies_final = pd.DataFrame()
tmp = True

# Pull data using Freshdesk API using companies endpoint
while tmp:
    FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
    FRESHDESK_KEY = "API_KEY"
    r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/companies?page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
    if r.content == '[]':
        break
    freshdesk_companies = pd.DataFrame(json_normalize(json.loads(r.content)))
    freshdesk_companies_final = freshdesk_companies_final.append(freshdesk_companies)
    print counter 
    counter += 1

# Write resulting dataframe into the output dataset 
freshdesk_companies_list = dataiku.Dataset("Freshdesk_companies_list")
freshdesk_companies_list.write_with_schema(freshdesk_companies_final)

 

 

Here is another example where we do something similar but pull satisfaction ratings information instead via the Freshdesk APIs and push this into the output dataset.

 

import dataiku
from dataiku import Dataset, Folder
import pandas as pd
import json
import requests
import base64
from pandas.io.json import json_normalize

# Create Pandas dataframe
counter = 1
freshdesk_satisfaction_final = pd.DataFrame()
tmp = True

# Pull data using the satisfaction_ratings API endpoint into the dataframe
while tmp:
    FRESHDESK_ENDPOINT = "https://XXXXX.freshdesk.com" # check if you have configured https, modify accordingly
    FRESHDESK_KEY = "API_KEY"
    r = requests.get(FRESHDESK_ENDPOINT + '/api/v2/surveys/satisfaction_ratings?created_since=2014-01-01T00:00:00Z&page=' + str(counter) + '&per_page=100',auth=(FRESHDESK_KEY, "X"))
    if r.content == '[]':
        break
    freshdesk_satisfaction = pd.DataFrame(json_normalize(json.loads(r.content)))
    freshdesk_satisfaction_final = freshdesk_satisfaction_final.append(freshdesk_satisfaction)
    print counter 
    counter += 1

# Write resulting dataframe into output dataset in DSS
freshdesk_satisfaction_list = dataiku.Dataset("Freshdesk_satisfaction")
freshdesk_satisfaction_list.write_with_schema(freshdesk_satisfaction_final)

 

 

Hopefully this helps and gets you started on the right track! 

Best,

Andrew

aw30
Level 4
Author

Hi Andrew,

 

This is great - thank you so much! I am waiting for the site to be put onto our whitelist but will mark this as a solution because I understand what you are doing and how it would work.

Thanks again!

Anne

dimitri
Dataiker

Hi @aw30 ,

To write the output of a code recipe into a DSS flow, you should use the Dataiku API to write either in a dataset or in a managed folder.

By default when creating the python recipe, DSS has automatically generated a few lines of code to let you write a DataFrame in the output dataset.

Based on your example, you can use write_schema and write_row_dict to populate your output dataset, as detailed in the code sample below.

import dataiku
import requests

headers = {"Accept": "application/json",
"Authorization": "xxxxxx"}
url = 'https://mysite.comtest/Myitem?sel=Name,Number,Size,Team,Status'
members = requests.get(url, headers=headers).json()

people = dataiku.Dataset("people") # Assuming the id of your output dataset is "people"
people.write_schema([
{
"name": "Name",
"type": "string",
},
{
"name": "Number",
"type": "int",
},
{
"name": "Size",
"type": "string",
},
{
"name": "Team",
"type": "string",
},
{
"name": "Status",
"type": "string",
}
])

with people.get_writer() as writer:
for member in members["Assets"]:
attributes = member["Attributes"]
writer.write_row_dict({
"Name": attributes["Name"]["value"],
"Number": attributes["Number"]["value"],
"Size": attributes["Size"]["value"],
"Team": attributes["Team.Name"]["value"],
"Status": attributes["Status.Name"]["value"]
})

 Have a great day!

aw30
Level 4
Author

Hi - thank you for your solution! I have marked it as accepted as well since it shows me how to identify the schema within it. Thanks again for your time!

Anne