Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

api connect plugin

tgb417
Neuron
Neuron
api connect plugin

Anyone out there using the API Connect Plugin?

What sorts of successes and challenges have you had with this tool?

In particular, I'm interested in how bulk data can be received from various APIs.  Key issues are both performance and reliability.

 

 

 

--Tom
0 Kudos
32 Replies
tgb417
Neuron
Neuron
Author

Ping...

Just reaching out to see if anyone is working with this.

The existing documentation is not quite enough for me to understand how to use this cleanly.

I'm so far able to get a return from the third-party API I'm working with.  However, it is sort of a RAW API String in a single column.

--Tom

--Tom
0 Kudos
EliasH
Dataiker
Dataiker

Hi @tgb417 ,

Sorry to see that no one has gotten back to you on this!

I've gone ahead and tested the plugin myself and was able to get around the data being extracted in a single column by unchecking "Raw JSON output" and including the key of the data array ("results") that is being returned. 

Screen Shot 2021-10-27 at 5.11.55 PM.png

In my example I'm using the following API https://pokeapi.co/api/v2/berry-firmness/ which requires no auth, as it sounds like you got everything covered up until Data extraction,  

EliasH_0-1635380495868.png

Can you try this and let me know how this goes for you?

 

tgb417
Neuron
Neuron
Author

@EliasH ,

Thanks for the reply.  I've also worked out most of this today.  The approach has worked with one of my datasets.  However, I've run into a few other issues as well.

One of the API I'm working with produces a result a bit different than what you are showing.

[{"Description":"My Membership","Id":1,"EditIndicator":false},{"Description":"FOREST","Id":2,"EditIndicator":false},{"Description":"Bridge","Id":14,"EditIndicator":false}]

Note that this result is missing the following that is part of your result that can be used to dig down into the "results".  

{"count":5,"next":null,"previous":null,"results": ...... }

So, I don't seem to be able to use the approach you are showing.  In this case I get multiple rows.  But the values for a row are in a column called api_responce in Json. For now I've creating a second visual recipe to parse these "api_responce"s.

The second problem I've run into this evening is that the API plugin shows true and false (with lower case first letters)  When I look over the data set the API connecter is creating, these values are being changed to True and False (with upper case first letters).  In visual recipes, I can't do things like unnest and fold the Json until I change the mixed case values to the strickly lower case values.  I've put in a Issue over on Github.  

FYI.  I'm using 9.0.5 on Ubuntu (inside of WSL2).  However, Ubunto 18 is patched up to date.

 

 

--Tom
0 Kudos
AlexB
Dataiker
Dataiker

Hi Tom,

What happens if the "Raw JSON output" box is un-checked ? This should make the "api_response" column replaced with "Description", "Id", and "EditIndicator". Do you get something different ?

Alex

0 Kudos
tgb417
Neuron
Neuron
Author

Regarding the Raw Jason button, if I remember correctly. 

It did not seem to make a difference in the results I was seeing.  Checked or unchecked I was getting JSON in a single result column.  

When, I get a moment, I will check again to confirm that this what is going on.

--Tom
0 Kudos
tgb417
Neuron
Neuron
Author

@AlexB

Regardless of whether "Raw JSON output" is checked or unchecked.  I'm getting Raw JSON output in a single column names api_response.  There has been only one dataset that I've pulled from this data source that I could uncheck "Raw JSON output" and get parsing done directly by the plugin.  

Right now, when I finally get the raw api_responce over into a visual recipe. I've noticed that the unnest object visual step has problems with the JSON from my source.  Already I've discovered that there are some transformations I have to apply before I can do an unnest object step.

Right now here are the transforms that I have discovered.

JSON Cleanup.png

I'm wondering if the API Connect plugin uses the same code base to parse Json that is being used by the visual recipe?  If so it seems like the REST API I'm working with and the basic JSON parsing library are not working well together.

Thoughts?

If you want we can get a support ticket open on this if that would be helpful.

--Tom
0 Kudos
tgb417
Neuron
Neuron
Author

@EliasH 

Is there a concurrent use limitation with the API Connect plugin?

I have one API Connect connection running right now.  Other attempts to use the plugin right now are failing with:

Test failed: Failed to read from python dataset : 
<class 'rest_api_client.RestAPIClientError'> : Error:
HTTPSConnectionPool(host='hostname.domain.com', port=443):
Max retries exceeded with url: /service/Data/MO/Summary
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff145251a90>:
Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

 

--Tom
0 Kudos
AlexB
Dataiker
Dataiker

Hi Tom !

According to the error message, the plugin is trying to reach https://hostname.domain.com, which does not exists. Could you check that the URL template is correct ?

Alex

0 Kudos
tgb417
Neuron
Neuron
Author

@AlexB ,

This is not the actual URL I’m trying to connect to. On this public forum I do not want to post the name of the server I’m working with.  The name I’m actually using I believe is correct.

I will do some more testing.  I was just curious if there was a known limit on how many connections the plugin can maintain simultaneously.

 

--Tom
0 Kudos
AlexB
Dataiker
Dataiker

Oh I see. In that case, the important part of the message would be "Temporary failure in name resolution", which would point to an issue with the DNS server.

To answer your earlier question, the "Max retries exceeded" part of the error message is just there as a reminder that several unsuccessful attempts where made before the python request decided to give up. There are no limitations on the plugin side, but there can be limits imposed by the API provider. However, you would typically get an HTTP 429 error code in that case. If that happens, you can use the rate limit to go bellow your authorized ratio and avoid triggering the error.

0 Kudos
tgb417
Neuron
Neuron
Author

@AlexB ,

I'm working that angle.  When I go over to Linux and do:

ping hostname.domain.com 

I'm getting a similar problem.  (Note: that hostname list in this example is not the actual hostname.) 

--Tom

--Tom
0 Kudos
tgb417
Neuron
Neuron
Author

@AlexB ,

After working out the DNS host resolution issue.  Things are working a bit better.

Still hampered by the JSON Decode Problems described above.

Are there any plans for a V1.0.3+

I'd be glad to help with testing.

--Tom

--Tom
0 Kudos
AlexB
Dataiker
Dataiker

@tgb417 yes version 1.0.3 should be ready in a few weeks.There might be a beta version on this page available before that.

This version will solve some JSON issues and implement rfc5988 for pagination, which might be useful for your API returning an array without the pagination details.

Sav
Level 1
Level 1

Hi everybody

I'm tring to use this plugin but i need to add a proxy to connect to my source.

Could you help me ?

0 Kudos
ikovacs
Level 2

Hi,
I am trying to get Offset pagination working.

In Plugin options I added following parameters:

  • Key limiting elements per page  = limit
  • Key for element offset = offset

What else I need to define to make it work?
Thanks,
Ivan

AlexB
Dataiker
Dataiker

Hi !

It really depends on the specifics of the API you are trying to access... Do you have access to its documentation, or is there an example of call / reply you can share with us ?

0 Kudos
ikovacs
Level 2

pagination in this API works  with limit and offset params in URL
apiurl.com/givemedata/daily?startdate=2022-06-20&limit=5000&offset=5000

0 Kudos
AlexB
Dataiker
Dataiker

Hi,

So as a first approach, all the query string (after the ? with the shape key=value) should go in the Query params section of the plugin, except limit and offset. For these two, you'd need to set Pagination mechanism to "Offset pagination", and set "limit" in Key limiting elements per page and "offset" in Key for element offset. As an exemple, the URL you sent should then look like in this screenshot:

Screenshot 2022-06-27 at 09.54.57.png

Once you have some data visible, you can go further by setting the actual path to data array key and uncheck "Raw JSON output".

Hope this helps

Alex

0 Kudos
ikovacs
Level 2

Thanks Alex,

startDate , endDate are project variables
NERCRegion is a Template variable from dataset

I tried to put it into Query params before, but it does not seem to generate limit and offset to URL ( see log attached)

Could you please have a look at the log?

Thanks,
Ivan

0 Kudos
AlexB
Dataiker
Dataiker

Ivan,

Currently the plugin requires the key / path to the data for counting the number of items and handling the pagination correctly. This will be improved in the next versions, but for now you need to have the "Raw JSON output" option deactivated, and the correct key in the "Key to data array (optional)" box. To help you finding out what this key is, you can refer to the "Path to data array" parameters in this documentation. In some instance, you might also want to use the "Maximum number of rows" option to avoid looping.

Alex