api connect plugin

Options
tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron

Anyone out there using the API Connect Plugin?

What sorts of successes and challenges have you had with this tool?

In particular, I'm interested in how bulk data can be received from various APIs. Key issues are both performance and reliability.

Tagged:
«1

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    Ping...

    Just reaching out to see if anyone is working with this.

    The existing documentation is not quite enough for me to understand how to use this cleanly.

    I'm so far able to get a return from the third-party API I'm working with. However, it is sort of a RAW API String in a single column.

    --Tom

  • EliasH
    EliasH Dataiker, Registered Posts: 34 Dataiker
    Options

    Hi @tgb417
    ,

    Sorry to see that no one has gotten back to you on this!

    I've gone ahead and tested the plugin myself and was able to get around the data being extracted in a single column by unchecking "Raw JSON output" and including the key of the data array ("results") that is being returned.

    Screen Shot 2021-10-27 at 5.11.55 PM.png

    In my example I'm using the following API https://pokeapi.co/api/v2/berry-firmness/ which requires no auth, as it sounds like you got everything covered up until Data extraction,

    EliasH_0-1635380495868.png

    Can you try this and let me know how this goes for you?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    edited July 17
    Options

    @EliasH
    ,

    Thanks for the reply. I've also worked out most of this today. The approach has worked with one of my datasets. However, I've run into a few other issues as well.

    One of the API I'm working with produces a result a bit different than what you are showing.

    [{"Description":"My Membership","Id":1,"EditIndicator":false},{"Description":"FOREST","Id":2,"EditIndicator":false},{"Description":"Bridge","Id":14,"EditIndicator":false}]

    Note that this result is missing the following that is part of your result that can be used to dig down into the "results".

    {"count":5,"next":null,"previous":null,"results": ...... }

    So, I don't seem to be able to use the approach you are showing. In this case I get multiple rows. But the values for a row are in a column called api_responce in Json. For now I've creating a second visual recipe to parse these "api_responce"s.

    The second problem I've run into this evening is that the API plugin shows true and false (with lower case first letters) When I look over the data set the API connecter is creating, these values are being changed to True and False (with upper case first letters). In visual recipes, I can't do things like unnest and fold the Json until I change the mixed case values to the strickly lower case values. I've put in a Issue over on Github.

    FYI. I'm using 9.0.5 on Ubuntu (inside of WSL2). However, Ubunto 18 is patched up to date.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    edited July 17
    Options

    @EliasH

    Is there a concurrent use limitation with the API Connect plugin?

    I have one API Connect connection running right now. Other attempts to use the plugin right now are failing with:

    Test failed: Failed to read from python dataset : 
    <class 'rest_api_client.RestAPIClientError'> : Error:
    HTTPSConnectionPool(host='hostname.domain.com', port=443):
    Max retries exceeded with url: /service/Data/MO/Summary
    (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff145251a90>:
    Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Hi Tom !

    According to the error message, the plugin is trying to reach https://hostname.domain.com, which does not exists. Could you check that the URL template is correct ?

    Alex

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Hi Tom,

    What happens if the "Raw JSON output" box is un-checked ? This should make the "api_response" column replaced with "Description", "Id", and "EditIndicator". Do you get something different ?

    Alex

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexB
    ,

    This is not the actual URL I’m trying to connect to. On this public forum I do not want to post the name of the server I’m working with. The name I’m actually using I believe is correct.

    I will do some more testing. I was just curious if there was a known limit on how many connections the plugin can maintain simultaneously.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    Regarding the Raw Jason button, if I remember correctly.

    It did not seem to make a difference in the results I was seeing. Checked or unchecked I was getting JSON in a single result column.

    When, I get a moment, I will check again to confirm that this what is going on.

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Oh I see. In that case, the important part of the message would be "Temporary failure in name resolution", which would point to an issue with the DNS server.

    To answer your earlier question, the "Max retries exceeded" part of the error message is just there as a reminder that several unsuccessful attempts where made before the python request decided to give up. There are no limitations on the plugin side, but there can be limits imposed by the API provider. However, you would typically get an HTTP 429 error code in that case. If that happens, you can use the rate limit to go bellow your authorized ratio and avoid triggering the error.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    edited July 17
    Options

    @AlexB
    ,

    I'm working that angle. When I go over to Linux and do:

    ping hostname.domain.com 

    I'm getting a similar problem. (Note: that hostname list in this example is not the actual hostname.)

    --Tom

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexB
    ,

    After working out the DNS host resolution issue. Things are working a bit better.

    Still hampered by the JSON Decode Problems described above.

    Are there any plans for a V1.0.3+

    I'd be glad to help with testing.

    --Tom

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexB

    Regardless of whether "Raw JSON output" is checked or unchecked. I'm getting Raw JSON output in a single column names api_response. There has been only one dataset that I've pulled from this data source that I could uncheck "Raw JSON output" and get parsing done directly by the plugin.

    Right now, when I finally get the raw api_responce over into a visual recipe. I've noticed that the unnest object visual step has problems with the JSON from my source. Already I've discovered that there are some transformations I have to apply before I can do an unnest object step.

    Right now here are the transforms that I have discovered.

    JSON Cleanup.png

    I'm wondering if the API Connect plugin uses the same code base to parse Json that is being used by the visual recipe? If so it seems like the REST API I'm working with and the basic JSON parsing library are not working well together.

    Thoughts?

    If you want we can get a support ticket open on this if that would be helpful.

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    @tgb417
    yes version 1.0.3 should be ready in a few weeks.There might be a beta version on this page available before that.

    This version will solve some JSON issues and implement rfc5988 for pagination, which might be useful for your API returning an array without the pagination details.

  • Sav
    Sav Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 2 Partner
    Options

    Hi everybody

    I'm tring to use this plugin but i need to add a proxy to connect to my source.

    Could you help me ?

  • ikovacs
    ikovacs Registered Posts: 4 ✭✭✭
    Options

    Hi,
    I am trying to get Offset pagination working.

    In Plugin options I added following parameters:

    • Key limiting elements per page = limit
    • Key for element offset = offset

    What else I need to define to make it work?
    Thanks,
    Ivan

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Hi !

    It really depends on the specifics of the API you are trying to access... Do you have access to its documentation, or is there an example of call / reply you can share with us ?

  • ikovacs
    ikovacs Registered Posts: 4 ✭✭✭
    Options

    pagination in this API works with limit and offset params in URL
    apiurl.com/givemedata/daily?startdate=2022-06-20&limit=5000&offset=5000

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Hi,

    So as a first approach, all the query string (after the ? with the shape key=value) should go in the Query params section of the plugin, except limit and offset. For these two, you'd need to set Pagination mechanism to "Offset pagination", and set "limit" in Key limiting elements per page and "offset" in Key for element offset. As an exemple, the URL you sent should then look like in this screenshot:

    Screenshot 2022-06-27 at 09.54.57.png

    Once you have some data visible, you can go further by setting the actual path to data array key and uncheck "Raw JSON output".

    Hope this helps

    Alex

  • ikovacs
    ikovacs Registered Posts: 4 ✭✭✭
    Options

    Thanks Alex,

    startDate , endDate are project variables
    NERCRegion is a Template variable from dataset

    I tried to put it into Query params before, but it does not seem to generate limit and offset to URL ( see log attached)

    Could you please have a look at the log?

    Thanks,
    Ivan

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Ivan,

    Currently the plugin requires the key / path to the data for counting the number of items and handling the pagination correctly. This will be improved in the next versions, but for now you need to have the "Raw JSON output" option deactivated, and the correct key in the "Key to data array (optional)" box. To help you finding out what this key is, you can refer to the "Path to data array" parameters in this documentation. In some instance, you might also want to use the "Maximum number of rows" option to avoid looping.

    Alex

  • ikovacs
    ikovacs Registered Posts: 4 ✭✭✭
    Options

    Thanks Alex, it works for me now!

  • maxmeu
    maxmeu Registered Posts: 1 ✭✭✭
    Options

    Hey,

    I am working on a different problem with the API Connect Plugin. What is the syntax for the request's body using the raw format? I have copied the syntax format from an API documentation for a POST request, but I am unable to figure out how to include the variables from my dataset as key/value pair. Any ideas? Or do I need to tick a box somewhere else?

    Best

    Max

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Hi,

    Say you have a dataset containing the data that you want to use as variables:

    variable_dataset.png

    You need  to start by clicking on that dataset, select API connect on the right side plugin panel, followed by API Connect recipe.

    flow.png

     In the recipe configuration panel, you then need to select all the columns that you will use as variables in you API calls. This is done in the Columns to use as variable section.

    Then write the body. On this picture there is a example of a standard JSON body. You can see that the variable is called simply by using the {{variable_name}} syntax.

    recipe_settings.png

    There could be subtleties, such as the need to add brackets before and after the variable in case if strings. I would recommend testing your settings by using the https://httpbin.org/anything API, which will just send you back your request defined as a JSON structure. This way you can check in the "data" section that the JSON format is valid and your variable was used as expected.

    apis_response.png

     Hope this helps,

    Alex

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexB

    I'm looking to figure out how to use Offset pagination. The documentation does not give a clue how the limit and offset are incremented. For so-called offset pagination.

    The API I'm working with would need iterative calls like:

    https://hostname.com/apis/thing/stuff?limit=10&skip=0

    https://hostname.com/apis/thing/stuff?limit=10&skip=10

    https://hostname.com/apis/thing/stuff?limit=10&skip=20

    To pull the "first" up to 30 records.

    Anyone out there gotten this working. I need to paginate through 10s of thousands of records at times.

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Hi Tom !

    For the Offset pagination mode, the offset key is incremented by the value of the limit key sent by the server. If you need to speed up the transfer, you first need to check that the server accept the limit value to be change. Try https://hostname.com/apis/thing/stuff?limit=100 and see if it returns 100 elements. If that's the case, you can add a larger limit with the QueryParams key/value pair interface.

    Screenshot 2022-08-10 at 09.43.42.png

    Alex 

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexB
    ,

    Unfortunately still no joy here. I seem to have control of the Total Limit with the Query Params limit = 5 But the Skip value is not being updated I think. Also is there a way to track the queries being sent by the plugin to see what is actually happening.

    Showing Parameters for the  API Connect Plugin Data Object.  Query Params limit set to 5 and key limiting element per page limit, key for element offset skip

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexB

    Maybe I'm doing the wrong type of pagination.

    I note in the returned header the following values. Returned API request header with  total, results, limit, skip listed out.

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Is this API public by any chance ? Or its documentation ?

    If not, could you send us the json returned in the browser with the equivalent of https://hostname.com/apis/thing/stuff?limit=2 ? (feel free to anonymize or remove any confidential data from it, I would just need to look at the overall json's structure...)

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @AlexB

    I sent you a private chat here on the community with some further details.

  • alec_peterson
    alec_peterson Registered Posts: 2 ✭✭✭
    Options

    Hi Alex,

    I am having an issue with the specific API I'm working with.

    1) Before I can make API resource requests, I have to first make a POST request for an authorization token, and then use that token in the headers for subsequent API calls.

    That part is successful and it's saving as a dataset with column name "api_response"

    2) I am trying to use that token string in subsequent API calls i.e. using that dataset as an input to an API Connect recipe.

    I've experimented with setting the "Columns to use as variables" as "api_response", then putting the headers Key as "Authorization" (per my API's documentation) and the Value as "Bearer {{api_response}}" but it's giving an error that the token is invalid. I've verified the token string from the input dataset is correct.

    I've tried reformatting the raw text per your example, as well as different versions of adding brackets and curly braces outside (per your comment about handling for strings) but it still gives the error.

    Do you have any advice on how to refer to the token string stored in the "api_response" column of my input dataset?

Setup Info
    Tags
      Help me…