Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
We would like to implement a DSS API with a python function which reads some data files stored in a distant server (not the DSS API Node).
This data file name will be an input parameter of the API.
The server, where the data file is stored, is known.
The data files are maintained by business users, this is why we can't use a managed/worked folder deployed with the API.
Please, could you advise what would be the best way to read this data file from the DSS API ?
Annie
Hi,
Thanks for clarifying. That makes now. Indeed you are correct by default when using managed folder in API endpoints these are copied over when the endpoint is deployed.
However, you should be able to use the Dataiku Public API from the API endpoint.
Here is an example for a dataset: https://community.dataiku.com/t5/Using-Dataiku/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-Pyth...
Here is an example reading a file from managed folder creating a pandas dataframe and printing json:
import pandas as pd
import dataikuapi
import io
def api_py_function(project_key, folder_id):
client = dataikuapi.DSSClient("http(s)://my_hostname:port", "apiKey")
folder = dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, folder_id)
contents = folder.list_contents()
for item in contents["items"]:
file = folder.get_file(item["path"])
file_data = file.content
rawData = pd.read_csv(io.StringIO(file_data.decode('utf-8')))
json_result = rawData.to_json(orient="table")
return json_result
Let me know if this helps!
Hi Annie,
Could please elaborate a bit on what you mean by "distant server". How will this remote server be able to server these data files? Will they be accessible over HTTP/S or an API? Or do you need to use SCP/SFTP?
You can use all python-requests for HTTP or REST API.
For SFTP you can use something like paramiko
Hello Alex,
I was thinking an approach like the following one :
- Create a DSS connection to this server/folder location to access to the files stored at this place
- question : Can we use such DSS connection in a API python function executed on the API Node ?
- Then access to the file content with a DSS API using this DSS connection : is it possible ?
it's just an idea ... I would like to understand the best way to address this need.
Hi Rona,
In your case, a managed folder should work.
https://doc.dataiku.com/dss/latest/apinode/endpoint-python-function.html#using-managed-folders
You mentioned you can't use a managed folder. Could elaborate a bit on why you can't use managed folders in your case? If you would be only using it as input and not writing back to this managed folder.
Hi Alex,
The files in the folder are managed by the business users. They can delete, update or add files in this folder at any moment.
With API Node, my understanding is that the managed folder are defined when we define the endpoint using it. Then this managed folder is copied to the API node in order to have the files available on the API node when we deploy the API Endpoint to the API Node. Is it correct ?
If yes, it means that the content of the folder is limited to the one defined at the time we deploy the API endpoint to the API Node. Then we can't dynamically consider the business users update in this folder.
Please, let me know if something is wrong with my understanding.
Thanks
Hi,
Thanks for clarifying. That makes now. Indeed you are correct by default when using managed folder in API endpoints these are copied over when the endpoint is deployed.
However, you should be able to use the Dataiku Public API from the API endpoint.
Here is an example for a dataset: https://community.dataiku.com/t5/Using-Dataiku/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-Pyth...
Here is an example reading a file from managed folder creating a pandas dataframe and printing json:
import pandas as pd
import dataikuapi
import io
def api_py_function(project_key, folder_id):
client = dataikuapi.DSSClient("http(s)://my_hostname:port", "apiKey")
folder = dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, folder_id)
contents = folder.list_contents()
for item in contents["items"]:
file = folder.get_file(item["path"])
file_data = file.content
rawData = pd.read_csv(io.StringIO(file_data.decode('utf-8')))
json_result = rawData.to_json(orient="table")
return json_result
Let me know if this helps!
client = dataikuapi.DSSClient("http(s)://my_hostname:port", "apiKey") folder = dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, folder_id) contents = folder.list_contents()
@AlexT In the above code am getting error while executing the line "contents = folder.list_contents()" There is no such attribute "list_contents" for the DSS managed folder.
Hi @vaishnavi ,
What DSS version are you on? list_contents was added starting with DSS 8.
https://doc.dataiku.com/dss/8.0/python-api/managed_folders.html
Thanks,