Create a dataiku dataframe from api call

UserKp
Level 3
Create a dataiku dataframe from api call

Hi all,

I am using API designer for which I created one API which takes some datasets from the flow and takes the API response fields, and some calculation is being done which is then stored in a pandas data frame. Now this data frame will be created every time the API is called with a new set of responses, I want to store this data frame into a flow which should be a Dataiku data frame.

So the Dataiku data frame in the flow should be updated whenever a new API call is made.

How can I achieve this, should writing the df into append mode into Dataiku flow ? will it work.? I don't want to experiment much some precise solutions or suggestions are appreciated.

Thanks

0 Kudos
7 Replies
Turribeach

In general you don't want an API service to be writting back to the Designer node, this is anti-pattern. Your API service should run completely separated from your Designer node. Think of your Designer node as a Development machine, would you want your Production server to depend on a Development server?

If you want to feed back data scored by an API service you have a couple of options. Either you write some code to persist the data scored from your API service directly into your database layer or you rely on the built-in API node logs and load all the logs back into your flow for feedback analysis.

0 Kudos
UserKp
Level 3
Author

so you mean API should not write data?

well, I created an Excel file with some columns and uploaded it to my test flow. I read this dataset and performed some calculations with extra columns and wrote back to this dataset, it's working fine

no in my API I have a test project and its credentials, I want to do the same with that API, before that I just want to create a blank dataset with some columns and define the schema, and hit the API to update this dataset.

i have tried the option to create a managed dataset but I cannot define its column so it is just an empty table with no columns, so while using that it throws an error that schema doesn't;t exist which is understandable because it just blank table

0 Kudos
Turribeach

I don't really understand what you are trying to achieve. Can you take a step back and explain your requirement? Why do you need to use an API? When you say API do you mean a Dataiku API Node with a Dataiku API service you coded and deployed or a call to the Dataiku Python API (aka SDK)? Can you post the code and the error you get in a code block please?

0 Kudos
UserKp
Level 3
Author

my idea was to create an empty dataframe from the UI,but while exploring I got to know its not possible to do it in the UI.

so I used a python code recipe to create a data frame, define schema,and then use this dataframe to write back to this dataset.

yes I am using the API designer here,the data has to be calculated based on the API response,that's the requirement,other wise I would have used recipe to do the reading and writing which makes sense.

now the data is being written in the created data frame from the api

0 Kudos
Turribeach

I really still struggle to understand your requirement. This is because you have't really shared your requirement but only the process you think should be followed to achieve that goal. Re-reading the whole thread all I get is that you are using the API node and that for some unknown reason you want some data frame updated in the Designer node every time the API gets called. 

As I said on my initial reply this is a bad idea and an anti-pattern. That doesn't mean you can't actually do it, it just means that you will not be following good practice if you do it. The other things to consider for what you are doing is that REST API endpoints should generally be stateless. Writting data back to Designer node will not only slow down your API response time but also might not be multi-thread safe. 

I can't really say anything else unless I understand your real requirement and the reasons for the unortodox design.

0 Kudos
UserKp
Level 3
Author

ok so this is the requirement-

  1. I calculate workload whenever a new ticket is registered in my application, you can assume it is a ticket system.
  2. Each time a new ticket comes the API is called and some list is returned
  3. I want to calculate some workload each time a ticket is created and API is called for that ticket, and save that calculation to a dataset because I have to use this information.
  4. Which I can use further to filter for particular dates and generate a report

To summarize each ticket calculate, save the data generate the report at the end of each day 

I just want to make sure the workload is realtime and that's why I went with this approach which might be wrong anyway

Do you have any ideas or suggestions?

 

0 Kudos
Turribeach

Return what you need to calculate in your API response. Use the built-in API logging capability to record all your responses. Then fetch and process your API logs as needed. You can customise the logging to log to different files per day, per hour, whatever suits your needs. 

0 Kudos