Call external API based on dataset row
Hello,
We have a scenario where we want to use the rows in a Dataiku dataset as input to an external API call and then have the results of that external API call either added to the existing dataset or create a new dataset that includes the data used in the request and the response.
Can you provide guidance on how this would done - I believe it would use a Python script?
Appreciate any assistance.
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
It's worth noting you have a few options to do this requirement. You could have the external API system call the Dataiku API, get the input dataset, produce the output and call back the Dataiku API again to write the output dataset. Or you could have Dataiku call the external API system, get the results back and write them to the output dataset. It depends on how you want things to run.
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Welcome to the Dataiku community. We are so glad you can join us.
I do a bunch of work with APIs from within Dataiku.
You may want to take a look at the API Connect plug-in.
https://www.dataiku.com/product/plugins/api-connect/
This may be helpful in your use case. I find it quicker than writing my own python requests based connections. This plugin provides two dataiku objects.
1. A data object that is a connection to a rest api that can appear in your flow.
2. A recipe process step that can take existing data in a dataiku data object and send it row by row to the API. This can be for further get called of the API or more useful PUT, or PATCH type calls.
Good luck with this and we will be interested in hearing more about your projects.
-
Thank you Tom for the quick response. Our environment does not have that plugin exposed due to security setups. Would this still be able to be done through Python then?
Thank you!
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,166 Neuron
You want to use the get_dataframe() API method to access a dataset:
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Generally speaking if you can do it in Python. You can do it in a Python Recipe in Dataiku. In particular I like the Jupyter Notebook integration in DSS.
That said you will likely need/want to have your admin create a code environment with the needed Libraries to make the writing of such code easier. These might include things like:jsonpath-ng==1.5.3requests==2.26.0
You will likely have your own preferences for such libraries. These libraries likely are not be part of the "standard" set of libraries available to you in the default code environment. Particularly for a security careful administrator. So you are going to have to have a conversation with your admin to get the libraries you need to get your work done. (Whether it is part of the code environment created for API Connect or for a Python Code Environment you will work with from a Python Recipe or Notebook.)
Good luck with that conversation.