Store Data From API Call and Flow running time

Jaspal Registered Posts: 9 ✭✭✭✭

Hi -

1. How can I store the data from an api call to the model and also the model output?

2. How can I determine (ball park figure), how long it will take to create the output, in order to establish if it is a synchronous or asynchronous call?

Operating system used: Linus



  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron


    I was looking at your post and I had some questions.

    When you say "store the data from an api call". Can you provide a bit more context about your use case. I'm not clear if you are:

    • trying to call an REST API or SOAP API from within a Dataiku Flow with something like the Python requests library and code you have written in a python recipe
    • Trying to call a REST API using something like the REST API Connect Plugin
    • Trying to call/consume an api you created and published from within Dataiku DSS, and are actually trying to call this API from an outside server?
    • Or are you talking about the performance of the Python API
    • Or something else that I've yet to guessed.

    Depending on your use case senario above that might help myself or other community members help you thing about your question 2 performance or throughput.

    When it comes to consuming a REST APIs from within Dataiku using say the API Connect plugin. There are lots of things that will impact throughput. What we are looking for is the weakest link.

    • Is the REST API well designed for Bulk Data consumption allowing you to consume 100s, thousands, or even 10s of Thousands of records per call.
    • Can you selectively pull the data you need if you are working with large datasets. Or do you have to pull lots of data and then filter that once you get it into Dataiku DSS
    • Network Latency and bandwidth could play a part
    • Will the API provide the data in a compressed format
    • Can you re-use a connection with the API using a single Session for many records. Or are you calling data one record at a time and incurring delays for opening and closing network connections for each of these calls.

    There are a number of other considerations on performance for other scenarios. So please feel free to post another comment and we will see if someone can provide you a bit of detail

Setup Info
      Help me…