Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Read a file, outside the API folder, from a DSS API

Solved!
rona
Level 3
Read a file, outside the API folder, from a DSS API

Hello,

We would like to implement a DSS API with a python function which reads some data files stored in a distant server (not the DSS API Node).

This data file name will be an input parameter of the API.
The server, where the data file is stored, is known.
The data files are maintained by business users, this is why we can't use a managed/worked folder deployed with the API.

Please, could you advise what would be the best way to read this data file from the DSS API ?

Annie

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi,

Thanks for clarifying. That makes now. Indeed you are correct by default when using managed folder in API endpoints these are copied over when the endpoint is deployed.

However, you should be able to use the Dataiku Public API  from the API endpoint.

Here is an example for a dataset: https://community.dataiku.com/t5/Using-Dataiku/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-Pyth... 

Here is an example reading a file from managed folder creating a pandas dataframe and printing json: 

import pandas as pd
import dataikuapi
import io


def api_py_function(project_key, folder_id):

    client = dataikuapi.DSSClient("http(s)://my_hostname:port", "apiKey")
    folder = dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, folder_id)
    contents = folder.list_contents()

    for item in contents["items"]:
        file = folder.get_file(item["path"])
        file_data = file.content
        rawData = pd.read_csv(io.StringIO(file_data.decode('utf-8')))
        json_result = rawData.to_json(orient="table")
        return json_result

Screenshot 2021-10-12 at 15.14.15.png

Let me know if this helps!

View solution in original post

0 Kudos
5 Replies
AlexT
Dataiker
Dataiker

Hi Annie,

Could please elaborate a bit on what you mean by "distant server". How will this remote server be able to server these data files? Will they be accessible over HTTP/S or an API? Or do you need to use SCP/SFTP? 

You can use all python-requests for HTTP or REST API.

For SFTP you can use something like paramiko

0 Kudos
rona
Level 3
Author

Hello Alex,

I was thinking an approach like the following one :

- Create a DSS connection to this server/folder location to access to the files stored at this place

     - question : Can we use such DSS connection in a API python function executed on the API Node ?

- Then access to the file content with a DSS API using this DSS connection : is it possible ?

it's just an idea ... I would like to understand the best way to address this need.

 

 

0 Kudos
AlexT
Dataiker
Dataiker

Hi Rona,

In your case, a managed folder should work.

 https://doc.dataiku.com/dss/latest/apinode/endpoint-python-function.html#using-managed-folders

You mentioned you can't use a managed folder.  Could elaborate a bit on why you can't use managed folders in your case? If you would be only using it as input and not writing back to this managed folder. 

 

 

0 Kudos
rona
Level 3
Author

Hi Alex,

The files in the folder are managed by the business users. They can delete, update or add files in this folder at any moment.

With API Node, my understanding is that the managed folder are defined when we define the endpoint using it. Then this managed folder is copied to the API node in order to have the files available on the API node when we deploy the API Endpoint to the API Node. Is it correct ?

If yes, it means that the content of the folder is limited to the one defined at the time we deploy the API endpoint to the API Node. Then we can't dynamically consider the business users update in this folder.

Please, let me know if something is wrong with my understanding.

Thanks

0 Kudos
AlexT
Dataiker
Dataiker

Hi,

Thanks for clarifying. That makes now. Indeed you are correct by default when using managed folder in API endpoints these are copied over when the endpoint is deployed.

However, you should be able to use the Dataiku Public API  from the API endpoint.

Here is an example for a dataset: https://community.dataiku.com/t5/Using-Dataiku/DSS-API-Designer-Read-dataset-from-DSS-flow-in-R-Pyth... 

Here is an example reading a file from managed folder creating a pandas dataframe and printing json: 

import pandas as pd
import dataikuapi
import io


def api_py_function(project_key, folder_id):

    client = dataikuapi.DSSClient("http(s)://my_hostname:port", "apiKey")
    folder = dataikuapi.dss.managedfolder.DSSManagedFolder(client, project_key, folder_id)
    contents = folder.list_contents()

    for item in contents["items"]:
        file = folder.get_file(item["path"])
        file_data = file.content
        rawData = pd.read_csv(io.StringIO(file_data.decode('utf-8')))
        json_result = rawData.to_json(orient="table")
        return json_result

Screenshot 2021-10-12 at 15.14.15.png

Let me know if this helps!

View solution in original post

0 Kudos
A banner prompting to get Dataiku DSS