Compressing CSV to GZIP and writing to SharePoint

Hi,
I'm getting errors while running this code in trying to compress a csv file to gzip and then uploading it to SharePoint using Python Code Recipe. Could someone guide me on how to accomplish this?
The code as per below:
# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd, numpy as np
import dataikuapi
from datetime import datetime
import gzip
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
from datetime import datetime, timedelta
# Date for files
days_to_subtract = 2
today = datetime.today() - timedelta(days=days_to_subtract)
today = today.strftime("%Y_%m_%d")
# Date for folders_1
today1 = datetime.today() - timedelta(days=days_to_subtract)
today1 = today1.strftime("%d%b%Y")
# Date for folders_2
today2 = datetime.today() - timedelta(days=days_to_subtract)
today2 = today2.strftime("%b%Y")
today2 = today2[:-4]+today2[-2:]
#print(today)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Read recipe inputs
ds = dataiku.Dataset("textiles")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = today
p.set_variables(v)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
#Specific code for a managed folder created in dataiku
managed_folder = "textiles_output"
df.to_csv(index=False).encode("utf-8")
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Write recipe outputs
content = df.to_csv(index=False).encode("utf-8")
with gzip.open( "textiles_%s.csv.gz" % dataiku.get_custom_variables()["current_month_year"], 'wt') as f:
f.write(str(content))
output_folder = dataiku.Folder(managed_folder)
output_folder.upload_stream("/" + today2 + "/" + today1 + "/" + str(f), df.to_csv(index=False))
Error:
Job failed: Error in Python process: At line 56: <class 'Exception'>: None: b"Failed to write data : <class 'sharepoint_client.SharePointClientError'> : A potentially dangerous Request.Path value was detected from the client (<)."
Operating system used: Windows 10
Best Answer
-
Hi !
According to the error message, it seems you have a "<" somewhere in the path or name of your file, which is not allowed.
I noticed that you are declaring a file handle (f, at the line starting by "with gzip.open"), and later on it seems you are building a path with it ( "/" + today2 + "/" + today1 + "/" + str(f) ) . This will print the object definition in the path, which do contains "<"...
Did you mean "/" + today2 + "/" + today1 + "/" + str(f.name) ?
Regards,
Alex
Answers
-
Hi AlexB,
That does solve the problem with the error message. Thanks! -
Hi everyone,
I have a similar use case. I am trying to compress a CSV and upload to sharepoint. However, I find that csv file is not actually compressed - also, when I look at the preview of sharepoint folder in dataiku - it shows the following error.
I used the code similar to what has been shared in the post and solution. Any suggestions are welcome.I use enterprise version - Dataiku 9.0.3. Is there a limit on size of the file that could be uploaded to sharepoint through dataiku?
-
Hi !
Can you share with us the code you used to do the file compression and upload ? (and a screenshot of the overall flow just for context)
-
Hi Alex,
Thank you for your response. My flow looks like the following image:
The code I used is
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import io import dataikuapi from datetime import datetime import gzip # Read recipe inputs input_data = dataiku.Dataset("input_data") input_data_df = input_data.get_dataframe() content = input_data_df.to_csv(index=False).encode("utf-8") # compress the file using gzip with gzip.open("result_%s.csv.gz" % dataiku.get_custom_variables()["run_date"], 'wt') as f: f.write(str(content)) output_folder = dataiku.Folder("dataiku_generated_folder_name") output_folder.upload_stream(str(f.name), content)
-
This modified version seems to work on my side:
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import io import dataikuapi from datetime import datetime import gzip input_data = dataiku.Dataset("input_data") input_data_df = input_data.get_dataframe() content = input_data_df.to_csv(index=False).encode("utf-8") compressed_content = gzip.compress(content) output_folder = dataiku.Folder("output_data") output_folder.upload_stream("filename.csv.gz", compressed_content)
Let me know if it helps
Alex
-
thank you, Alex - this worked for me.