Manipulate Excel File contained in Sharepoint Managed Folder in a Python Recipe using openpyxl
Hi Everyone !
I actually try to exctract a Excel file from a Managed Folder and to manupulate it with openpyxl to modify it and then upload it again. The special thing with this Managed Folder is that it's shared with SharePoint. So when i try to call the Excel file in my notebook i can't call any path.
in fact I'd like to use the .load_Workbook() method from openpyxl to manipulate it, but it needs a path ..
I tried several things with the download_stream() method of the Dataiku.Folder() Class but i can't transform the stream back in Excel.
Does anyone have an idea how i could perform this ?
Thanks !
Answers
-
Hi @Thai_Duong
,The load_workbook() function also accepts a file object, but it requires the file object to support the seek method. Managed folders don't directly support seeking, but we can work around it by loading the file into memory first.
The following example opens a spreadsheet from a managed folder, then saves it back to the folder:
import io import shutil import dataiku import openpyxl folder = dataiku.Folder("YOUR_FOLDER") # Open an Excel file from the managed folder with folder.get_download_stream("SPREADSHEET.xlsx") as file: with io.BytesIO() as seekable: shutil.copyfileobj(file, seekable) wb = openpyxl.load_workbook(seekable) # Save the workbook back to the managed folder with io.BytesIO() as seekable: wb.save(seekable) seekable.seek(0) folder.upload_stream("NEW_SPREADSHEET.xlsx", seekable)
Thanks,
Zach
-
Thai_Duong Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭
Hey Zac,
Thanks for your reply !
Just tried your code and i have the Following error--------------------------------------------------------------------------- BadZipFile Traceback (most recent call last) <ipython-input-16-48f6c83b7a08> in <module> 10 with io.BytesIO() as seekable: 11 shutil.copyfileobj(file, seekable) ---> 12 writer = pd.ExcelWriter(seekable, engine='openpyxl', mode='a') 13 writer.book = openpyxl.load_workbook(seekable) /applis/19499-datah/DSS_DATA/code-envs/python/JeanPatarot_env/lib/python3.6/site-packages/pandas/io/excel/_openpyxl.py in __init__(self, path, engine, mode, **engine_kwargs) 23 from openpyxl import load_workbook 24 ---> 25 book = load_workbook(self.path) 26 self.book = book 27 else: /applis/19499-datah/DSS_DATA/code-envs/python/JeanPatarot_env/lib/python3.6/site-packages/openpyxl/reader/excel.py in load_workbook(filename, read_only, keep_vba, data_only, keep_links) 314 """ 315 reader = ExcelReader(filename, read_only, keep_vba, --> 316 data_only, keep_links) 317 reader.read() 318 return reader.wb /applis/19499-datah/DSS_DATA/code-envs/python/JeanPatarot_env/lib/python3.6/site-packages/openpyxl/reader/excel.py in __init__(self, fn, read_only, keep_vba, data_only, keep_links) 122 def __init__(self, fn, read_only=False, keep_vba=KEEP_VBA, 123 data_only=False, keep_links=True): --> 124 self.archive = _validate_archive(fn) 125 self.valid_files = self.archive.namelist() 126 self.read_only = read_only /applis/19499-datah/DSS_DATA/code-envs/python/JeanPatarot_env/lib/python3.6/site-packages/openpyxl/reader/excel.py in _validate_archive(filename) 94 raise InvalidFileException(msg) 95 ---> 96 archive = ZipFile(filename, 'r') 97 return archive 98 /usr/lib64/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64) 1129 try: 1130 if mode == 'r': -> 1131 self._RealGetContents() 1132 elif mode in ('w', 'x'): 1133 # set the modified flag so central directory gets written /usr/lib64/python3.6/zipfile.py in _RealGetContents(self) 1196 raise BadZipFile("File is not a zip file") 1197 if not endrec: -> 1198 raise BadZipFile("File is not a zip file") 1199 if self.debug > 1: 1200 print(endrec) BadZipFile: File is not a zip file
Here i tried with the pandas Excel Writer, but even just with the load_workbook method, i had the same error.
-
Hi @Thai_Duong
,openpyxl raises this error if it doesn't recognize the file format. Are you sure that your Excel file is a XLSX/XLSM/XLTX/XLTM file?
Note that openpyxl doesn't support the older XLS format.
-
Thai_Duong Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭
Hi @ZachM
!
Yes i'm sure, I really don't know where the error come from, i'll ask my Dataiku admins. Maybe it's related to the SharePoint folder.