Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

How to write an excel file in a folder with python where folder use sharepoint plugin

Solved!
PhilippeDulac
Level 1
How to write an excel file in a folder with python where folder use sharepoint plugin

Hi, 

Python script is not working when I want to write an excel file to a folder associated to sharepoint plugin.

Python fails to retrieve path needed to excelwriter.
path seems to not be part of folder_info structure of folder when using sharepoint plugin while it is on server's filesystem folder.

Here the code used : 

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os

print ("FOLDER_INFO")
print (folder_info)
f_path = os.path.join(folder_info['path'] , file_name)
writer = pd.ExcelWriter(f_path,engine='xlsxwriter')

... 

 

error is the following :

Job failed: Error in Python process: At line 24: <class 'KeyError'>: path

LOGS (using sharepoint plugin): 

[2021/09/27-10:50:53.849] [null-out-64] [INFO] [dku.utils]  - FOLDER_INFO
[2021/09/27-10:50:53.850] [null-out-64] [INFO] [dku.utils]  - {'projectKey': 'LPC_PROD_OUT_REF_PREV', 'directoryBasedPartitioning': False, 'name': 'OUTPUT_FOLDER', 'id': 'yrztMCMF', 'accessInfo': {}, 'type': 'fsprovider_sharepoint-server_sharepoint-server_shared-documents'}

 

LOGS when it's working with server's filesystem folder: 

[2021/09/23-15:29:38.739] [null-out-67] [INFO] [dku.utils]  - FOLDER_INFO
[2021/09/23-15:29:38.740] [null-out-67] [INFO] [dku.utils]  - {'path': '/data/dataiku/dss_design/managed_folders/LPC_PROD_OUT_REF_PREV/ivaCaknz', 'projectKey': 'LPC_PROD_OUT_REF_PREV', 'directoryBasedPartitioning': False, 'name': 'OUT', 'id': 'ivaCaknz', 'accessInfo': {'root': '/data/dataiku/dss_design/managed_folders/LPC_PROD_OUT_REF_PREV/ivaCaknz'}, 'type': 'Filesystem'}

 

Tell me if there's another way to export an excel file on a sharepoint folder ?

 

Thanks in advance for your help

 

Philippe

0 Kudos
1 Solution
AlexB
Dataiker
Dataiker

Hi Philippe,

Assuming a simple flow with a python recipe taking a dataset as an input and a folder as output

Screenshot 2021-09-27 at 17.02.15.png

you can use the following code: 

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd
from io import BytesIO
import openpyxl

# Read recipe inputs
source = dataiku.Dataset("Source")
source_df = source.get_dataframe()

# First, save the data as excel format into a bytes string
stream = BytesIO()
excel_writer = pd.ExcelWriter(stream, engine='openpyxl')
source_df.to_excel(excel_writer, sheet_name='Sheet1')
excel_writer.save()

# Rewind to the begining of the bytes string
stream.seek(0)

# Write recipe outputs, with the Dataiku API in case the folder is remote
# (note: written this way, the recipe will also work if the target folder is local)
target_folder = dataiku.Folder("<<put your folder reference here>>")
with target_folder.get_writer("excelfile.xlsx") as writer:
    writer.write(stream.read())

Best regards,

Alex

View solution in original post

0 Kudos
4 Replies
AlexB
Dataiker
Dataiker

Hi Philippe,

When accessing the filesystem folder, you are using a local filesystem. This is why you can obtain the files path and interact directly with it.

However, in the case of the SharePoint folder, you are interacting with a remote folder. This is done via a DSS plugin, and because of that you need to use the DSS API to do so. You can find more information here.

Note that the DSS API allows you to interact with both type of filesystems.

Alex

0 Kudos
PhilippeDulac
Level 1
Author

Thanks Alex for this answer.

Documentation links are confirming your message and it is clear to me we can not proceed remote folder the same way we proceed with local folder.

Then it doesn't seems easy to export an excel to a remote folder. It would be very usefull to have a sample ...

 

Best regards,

 

Philippe

0 Kudos
AlexB
Dataiker
Dataiker

Hi Philippe,

Assuming a simple flow with a python recipe taking a dataset as an input and a folder as output

Screenshot 2021-09-27 at 17.02.15.png

you can use the following code: 

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd
from io import BytesIO
import openpyxl

# Read recipe inputs
source = dataiku.Dataset("Source")
source_df = source.get_dataframe()

# First, save the data as excel format into a bytes string
stream = BytesIO()
excel_writer = pd.ExcelWriter(stream, engine='openpyxl')
source_df.to_excel(excel_writer, sheet_name='Sheet1')
excel_writer.save()

# Rewind to the begining of the bytes string
stream.seek(0)

# Write recipe outputs, with the Dataiku API in case the folder is remote
# (note: written this way, the recipe will also work if the target folder is local)
target_folder = dataiku.Folder("<<put your folder reference here>>")
with target_folder.get_writer("excelfile.xlsx") as writer:
    writer.write(stream.read())

Best regards,

Alex

View solution in original post

0 Kudos
PhilippeDulac
Level 1
Author

Hi Alex,

Sorry for the delay of my feedback : I was not able to adapt my script before.
Script needed many adaptations but your sample was very usefull.

At last it is working as expected.

Thanks a lot for your help,

Philippe

0 Kudos
A banner prompting to get Dataiku DSS