Call external API based on dataset row

Solved!
KAL_User
Level 1
Call external API based on dataset row

Hello,

We have a scenario where we want to use the rows in a Dataiku dataset as input to an external API call and then have the results of that external API call either added to the existing dataset or create a new dataset that includes the data used in the request and the response.   

Can you provide guidance on how this would done - I believe it would use a Python script?

Appreciate any assistance.

0 Kudos
1 Solution

It's worth noting you have a few options to do this requirement. You could have the external API system call the Dataiku API, get the input dataset, produce the output and call back the Dataiku API again to write the output dataset. Or you could have Dataiku call the external API system, get the results back and write them to the output dataset. It depends on how you want things to run.

View solution in original post

0 Kudos
5 Replies
tgb417

@KAL_User 

Welcome to the Dataiku community. We are so glad you can join us.

I do a bunch of work with APIs from within Dataiku.

You may want to take a look at the API Connect plug-in.

https://www.dataiku.com/product/plugins/api-connect/ 

This may be helpful in your use case.  I find it quicker than writing my own python requests based connections.   This plugin provides two dataiku objects.

1. A data object that is a connection to a rest api that can appear in your flow.

2. A recipe process step that can take existing data in a dataiku data object and send it row by row to the API.  This can be for further get called of the API or more useful PUT, or PATCH type calls.   

Good luck with this and we will be interested in hearing more about your projects. 

--Tom
0 Kudos
KAL_User
Level 1
Author

Thank you Tom for the quick response.   Our environment does not have that plugin exposed due to security setups.   Would this still be able to be done through Python then?

Thank you!

0 Kudos

You want to use the get_dataframe() API method to access a dataset:

https://developer.dataiku.com/latest/api-reference/python/datasets.html#dataiku.Dataset.get_datafram...

 

0 Kudos

It's worth noting you have a few options to do this requirement. You could have the external API system call the Dataiku API, get the input dataset, produce the output and call back the Dataiku API again to write the output dataset. Or you could have Dataiku call the external API system, get the results back and write them to the output dataset. It depends on how you want things to run.

0 Kudos

@KAL_User 

Generally speaking if you can do it in Python.  You can do it in a Python Recipe in Dataiku.  In particular I like the Jupyter Notebook integration in DSS. 

That said you will likely need/want to have your admin create a code environment with the needed Libraries to make the writing of such code easier.  These might include things like:

jsonpath-ng==1.5.3
requests==2.26.0

You will likely have your own preferences for such libraries. These libraries likely are not be part of the "standard" set of libraries available to you in the default code environment.  Particularly for a security careful administrator.  So you are going to have to have a conversation with your admin to get the libraries you need to get your work done.  (Whether it is part of the code environment created for API Connect or for a Python Code Environment you will work with from a Python Recipe or Notebook.)

Good luck with that conversation.

--Tom
0 Kudos