how to save files in python API?

dataicool
Level 2
how to save files in python API?

In a project, I created a python code API in API-designer.

API input : day1,day2

API function: 

query a table, select * from table where date>day1 and date<day2

and then save result into CSV file, save in a folder(in this project)

 

I tried in API code:

#1 disign_node, can read

path1 = '/data/dataiku/dss_data/managed_folders/PROJECT_ID/FOLDER_ID/test.csv'

pd.read_csv(path1)

 

#2 disign_node,  can read, can save

path1 = '/data/dataiku/dss_data/managed_folders/PROJECT_ID/FOLDER_ID/test.csv'

pd.read_csv(path1)

path2 = '/data/dataiku/dss_data/managed_folders/PROJECT_ID/FOLDER_ID/test_2.csv'

df1.to_csv(path2)

 

#3 disign_node, I set working folders in API settings(chose my folder) can read

path1 = os.path.join(folders[0],"test.csv")

pd.read_csv(path1)

 

#4 disign_node,  can read, and no error, but I CAN NOT find test_2.csv in my folder, 

path1 = os.path.join(folders[0],"test.csv")

pd.read_csv(path1)

path2 = os.path.join(folders[0],"test_2.csv")

df1.to_csv(path2)

I printed folders[0], folders[0] is "/data/dataiku/dss_data/tmp/apinode-devserver/services/API_ID/xxxxx.../test_2.csv"

 

#5 disign_node, if I use print(folders[0]) result, CAN NOT read

path1 = "/data/dataiku/dss_data/tmp/apinode-devserver/services/API_ID/xxxxx.../test_2.csv"

pd.read_csv(path1)

 

#6 if I deployed my API into dataiku deploy-node, CAN NOT read, CAN NOT SAVE

path1 = '/data/dataiku/dss_data/managed_folders/PROJECT_ID/FOLDER_ID/test.csv'

pd.read_csv(path1)

path2 = '/data/dataiku/dss_data/managed_folders/PROJECT_ID/FOLDER_ID/test_2.csv'

df1.to_csv(path2)

 

#7 deploy-node, can read, no error, but I CAN NOT find test_2.csv in my folder,

path1 = os.path.join(folders[0],"test.csv")

pd.read_csv(path1)

path2 = os.path.join(folders[0],"test_2.csv")

df1.to_csv(path2)

 

summary:

1) in design node, can use this path for read&save   ('/data/dataiku/dss_data/managed_folders/PROJECT_ID/FOLDER_ID/test.csv')

but CAN NOT use this path in deploy node

2) in design node and deploy node, can use this path for read

path1 = os.path.join(folders[0],"test.csv")

pd.read_csv(path1)

but CAN NOT use this path for save(no error but I can not find files)

path2 = os.path.join(folders[0],"test_2.csv")

df1.to_csv(path2)

 

Question:

what should I do? I just want to use API to save file in deploy node.

 

thanks a lot

0 Kudos
6 Replies
Turribeach

First of all please use the code block (</> icon) to post your code so that proper identation is respected as Python is very picky about identation. Secondly I suspect that a lot of your tests must be using Jupyter notebooks which are basically a sandbox area to play around. In general in Dataiku you are not allowed to use Python direct file manipulation methods to read and write files when you are in a proper Python recipe or other non-Jupyter notebook execution environment. For that you will setup a managed folder and use the Dataiku  API methods.

Then in your post you talk about "deploy node". It's not really clear to me if you are talking about the embedded Deployer in the Designer Node, a separate Deployer Node installation or a node you are deploying to. Please clarify. It's also confusing because you talk about having an API in the API Designer. 

I really struggle to understand your requirement. This is because you have't really shared your requirement but only the process you think should be followed to achieve the goal. In most cases you don't want an API service to be writting back to the Designer node, this is an anti-pattern. That doesn't mean you can't actually do it, it just means that you will not be following good practice if you do it. Your API service should run completely separate from your Designer Node and not really depend on it to be up and running. Think of your Designer Node as a Development machine, would you want your Production server to depend on a Development server? The other thing to consider is that REST API endpoints should generally be stateless. Writting data back to the Designer Node will not only slow down your API response times but also might not be multi-thread safe.

If you want to feed back data scored by an API service you have a couple of options. Either you write some code to persist the data scored from your API service directly into your database/storage layer or you rely on the built-in API node logs and load all the logs back into your flow for feedback analysis.

0 Kudos
dataicool
Level 2
Author

Hello, 

I write a python script, and deployed as an API inside Dataiku.

my requirements:

this API can read CSV file from folder, and can save CSV file into folder(same folder)

now my API can read data successfully, but I don't know how to save data...

even I use same path...

---

the same question like this one:

https://community.dataiku.com/t5/Using-Dataiku/How-to-save-a-dataframe-as-a-csv-in-a-managed-folder/...

 

thank you,

0 Kudos

A REST API is not meant to read and write files, it's meant to be called and return data. What is your actual requirement? Why is reading and writting to a file needed?

0 Kudos
dataicool
Level 2
Author

Hello, 

API call and reture data, for example input x, and return y,

in my case, y = f(x,z), z = some csv data stored in folder,

so I need loading z into my API(reading data),

and in the same time, I want to save some data,

for example w = g(x,z), I need writing w into CSV file...

0 Kudos

Again you are not explaining Why you need to save a file which is just What you are doing. A requirement explains Why you are doing, what's the primary objective. Why can't you have the API log z and then use the API logs to retrieve the data you need?

0 Kudos
dataicool
Level 2
Author

Why you need to save a file..

- I want to save data as csv files and send csv to other people by email,

 

Why can't you have the API log z and then use the API logs to retrieve the data you need?

- the data what I want to save are not log data, I want to save tabular data...

0 Kudos