Read a file, outside the API folder, from a DSS API
Hello,
We would like to implement a DSS API with a python function which reads some data files stored in a distant server (not the DSS API Node).
This data file name will be an input parameter of the API.
The server, where the data file is stored, is known.
The data files are maintained by business users, this is why we can't use a managed/worked folder deployed with the API.
Please, could you advise what would be the best way to read this data file from the DSS API ?
Annie
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Thanks for clarifying. That makes now. Indeed you are correct by default when using managed folder in API endpoints these are copied over when the endpoint is deployed.
However, you should be able to use the Dataiku Public API from the API endpoint.
Here is an example for a dataset: https://community.dataiku.com/t5/Using-Dataiku/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-Python-API/m-p/7543
Here is an example reading a file from managed folder creating a pandas dataframe and printing json:
import pandas as pd import dataikuapi import io def api_py_function(project_key, folder_id): client = dataikuapi.DSSClient("http(s)://my_hostname:port", "apiKey") folder = dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, folder_id) contents = folder.list_contents() for item in contents["items"]: file = folder.get_file(item["path"]) file_data = file.content rawData = pd.read_csv(io.StringIO(file_data.decode('utf-8'))) json_result = rawData.to_json(orient="table") return json_result
Let me know if this helps!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi Annie,
Could please elaborate a bit on what you mean by "distant server". How will this remote server be able to server these data files? Will they be accessible over HTTP/S or an API? Or do you need to use SCP/SFTP?
You can use all python-requests for HTTP or REST API.
For SFTP you can use something like paramiko
-
Hello Alex,
I was thinking an approach like the following one :
- Create a DSS connection to this server/folder location to access to the files stored at this place
- question : Can we use such DSS connection in a API python function executed on the API Node ?
- Then access to the file content with a DSS API using this DSS connection : is it possible ?
it's just an idea ... I would like to understand the best way to address this need.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi Rona,
In your case, a managed folder should work.
https://doc.dataiku.com/dss/latest/apinode/endpoint-python-function.html#using-managed-folders
You mentioned you can't use a managed folder. Could elaborate a bit on why you can't use managed folders in your case? If you would be only using it as input and not writing back to this managed folder.
-
Hi Alex,
The files in the folder are managed by the business users. They can delete, update or add files in this folder at any moment.
With API Node, my understanding is that the managed folder are defined when we define the endpoint using it. Then this managed folder is copied to the API node in order to have the files available on the API node when we deploy the API Endpoint to the API Node. Is it correct ?
If yes, it means that the content of the folder is limited to the one defined at the time we deploy the API endpoint to the API Node. Then we can't dynamically consider the business users update in this folder.
Please, let me know if something is wrong with my understanding.
Thanks
-
client = dataikuapi.DSSClient("http(s)://my_hostname:port", "apiKey") folder = dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, folder_id) contents = folder.list_contents()
@AlexT
In the above code am getting error while executing the line "contents = folder.list_contents()" There is no such attribute "list_contents" for the DSS managed folder. -
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @vaishnavi
,
What DSS version are you on? list_contents was added starting with DSS 8.https://doc.dataiku.com/dss/8.0/python-api/managed_folders.html
Thanks,