Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
We are having some problems while trying to access a managed folder which refers to a sharepoint location. I have reauthenticated user accounts for sharepoint.
Error message -
Oops: an unexpected error occurred Listing files failed, caused by: Exception: Error 429 (get_files) Please see our options for getting help HTTP code: 500, type: java.io.IOException
Error 429 occurs when too many requests have been made to SharePoint Online in a short amount of time . You can reduce the occurrence of this problem by using SSO as a form of authentication (option 2 in the plugin documentation ).
I am already using SSO as the auth method using Presets. When I try to write files to a Sharepoint location in a loop, I always get this error (its not fixed at which point this gets thrown).
Exception: None: b"Failed to write data : <class 'sharepoint_client.SharePointClientError'> : Error 429 (create_folder)"
Could you give us more details about your setup, so that we can reproduce the problem on our side ? In particular, how do you loop the write ? Is it done from a python code ? What is the size and number of the files and the delay between two rewrites ?
out_folder = dataiku.Folder("odbID")
for currentCountry in country1:
mydataset_df_Country = mydataset_df[mydataset_df['COUNTRY'] == currentCountry]
compCodeFiltered = mydataset_df_Country['COMPANY CODE'].unique()
#Looping through the company codes in each country
for compCode in compCodeFiltered:
mydataset_df_CompCode= mydataset_df_Country[mydataset_df_Country['COMPANY CODE'] == compCode]
fiscalYrFiltered = mydataset_df_CompCode['COMMENTARY YEAR'].unique()
#Looping through each year present for the company
for fiscalYrCurrent in fiscalYrFiltered:
mydataset_df_FiscalYr= mydataset_df_CompCode[mydataset_df_CompCode['COMMENTARY YEAR'] == fiscalYrCurrent]
sourceQuarterFiltered = mydataset_df_FiscalYr['SOURCE_QUARTER'].unique()
#Looping through each quarter within the year to create separate output files
for quarters in sourceQuarterFiltered:
df = mydataset_df_FiscalYr[mydataset_df_FiscalYr['SOURCE_QUARTER'] == quarters].copy()
stream = BytesIO()
excel_writer = pd.ExcelWriter(stream, engine='xlsxwriter')
# Rewind to the begining of the bytes string
with out_folder.get_writer("/file/%s/%s/%d/static folder/%s_Test_%d_%d_Output.xlsx" %(currentCountry, compCode, commentaryYear, compCode, sourceQuarter,commentaryYear)) as writer:
print("Time is ",time.time())
Thank you for sharing this code. Could you also give me a rough idea of the average excel file size once stored on SharePoint ? With this I should have enough information to reproduce the issue on our side...
Yes we managed to recreate the problem on our side and a fix in currently under review. The new version of the plugin should be soon available on the store.
Thanks for the version update to the plugin. It has mostly solved the issue, but we are regularly encountering a different error. Stactrace below -
Activity failed com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: At line 268: <class 'dataikuapi.utils.DataikuException'>: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to write data : <class 'json.decoder.JSONDecodeError'> : Expecting value: line 1 column 1 (char 0) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileIfPossible(JobExecutionResultHandler.java:106) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileOrLogs(JobExecutionResultHandler.java:39) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.throwFromErrorFileOrLogs(JobExecutionResultHandler.java:34) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:26) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:75) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:57) at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:72) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
Any ideas will help.
- Can I assume this is the same setup ? (SSO, writing large number of files to a sharepoint folder using a python script ?)
- How frequently do you have this problem and what is the overall usage of the script generating the problem (number of files transfered, how often and average size of the files)
On rare instances, the SharePoint Online API has temporary problems and sends back an error page in HTML format instead of the expected JSON. We will get the plugin to handle this in its next version. In the meantime, you can probably add an exception, or some retry, in your script. So where you currently have:
you can have instead:
successful = False attemps = 0 while not successful and attemps < 3: attemps = attemps + 1 try: writer.write(stream.read()) successful = True except: time.sleep(5)
Hope this helps,