Create a dataiku dataframe from api call

UserKp
UserKp Registered Posts: 20

Hi all,

I am using API designer for which I created one API which takes some datasets from the flow and takes the API response fields, and some calculation is being done which is then stored in a pandas data frame. Now this data frame will be created every time the API is called with a new set of responses, I want to store this data frame into a flow which should be a Dataiku data frame.

So the Dataiku data frame in the flow should be updated whenever a new API call is made.

How can I achieve this, should writing the df into append mode into Dataiku flow ? will it work.? I don't want to experiment much some precise solutions or suggestions are appreciated.

Thanks

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,877 Neuron

    In general you don't want an API service to be writting back to the Designer node, this is anti-pattern. Your API service should run completely separated from your Designer node. Think of your Designer node as a Development machine, would you want your Production server to depend on a Development server?

    If you want to feed back data scored by an API service you have a couple of options. Either you write some code to persist the data scored from your API service directly into your database layer or you rely on the built-in API node logs and load all the logs back into your flow for feedback analysis.

  • UserKp
    UserKp Registered Posts: 20

    so you mean API should not write data?

    well, I created an Excel file with some columns and uploaded it to my test flow. I read this dataset and performed some calculations with extra columns and wrote back to this dataset, it's working fine

    no in my API I have a test project and its credentials, I want to do the same with that API, before that I just want to create a blank dataset with some columns and define the schema, and hit the API to update this dataset.

    i have tried the option to create a managed dataset but I cannot define its column so it is just an empty table with no columns, so while using that it throws an error that schema doesn't;t exist which is understandable because it just blank table

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,877 Neuron

    I don't really understand what you are trying to achieve. Can you take a step back and explain your requirement? Why do you need to use an API? When you say API do you mean a Dataiku API Node with a Dataiku API service you coded and deployed or a call to the Dataiku Python API (aka SDK)? Can you post the code and the error you get in a code block please?

  • UserKp
    UserKp Registered Posts: 20

    my idea was to create an empty dataframe from the UI,but while exploring I got to know its not possible to do it in the UI.

    so I used a python code recipe to create a data frame, define schema,and then use this dataframe to write back to this dataset.

    yes I am using the API designer here,the data has to be calculated based on the API response,that's the requirement,other wise I would have used recipe to do the reading and writing which makes sense.

    now the data is being written in the created data frame from the api

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,877 Neuron

    I really still struggle to understand your requirement. This is because you have't really shared your requirement but only the process you think should be followed to achieve that goal. Re-reading the whole thread all I get is that you are using the API node and that for some unknown reason you want some data frame updated in the Designer node every time the API gets called.

    As I said on my initial reply this is a bad idea and an anti-pattern. That doesn't mean you can't actually do it, it just means that you will not be following good practice if you do it. The other things to consider for what you are doing is that REST API endpoints should generally be stateless. Writting data back to Designer node will not only slow down your API response time but also might not be multi-thread safe.

    I can't really say anything else unless I understand your real requirement and the reasons for the unortodox design.

  • UserKp
    UserKp Registered Posts: 20

    ok so this is the requirement-

    1. I calculate workload whenever a new ticket is registered in my application, you can assume it is a ticket system.
    2. Each time a new ticket comes the API is called and some list is returned
    3. I want to calculate some workload each time a ticket is created and API is called for that ticket, and save that calculation to a dataset because I have to use this information.
    4. Which I can use further to filter for particular dates and generate a report

    To summarize each ticket calculate, save the data generate the report at the end of each day

    I just want to make sure the workload is realtime and that's why I went with this approach which might be wrong anyway

    Do you have any ideas or suggestions?

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,877 Neuron

    Return what you need to calculate in your API response. Use the built-in API logging capability to record all your responses. Then fetch and process your API logs as needed. You can customise the logging to log to different files per day, per hour, whatever suits your needs.

Setup Info
    Tags
      Help me…