Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I'm getting errors while running this code in trying to compress a csv file to gzip and then uploading it to SharePoint using Python Code Recipe. Could someone guide me on how to accomplish this?
The code as per below:
# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd, numpy as np
import dataikuapi
from datetime import datetime
import gzip
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
from datetime import datetime, timedelta
# Date for files
days_to_subtract = 2
today = datetime.today() - timedelta(days=days_to_subtract)
today = today.strftime("%Y_%m_%d")
# Date for folders_1
today1 = datetime.today() - timedelta(days=days_to_subtract)
today1 = today1.strftime("%d%b%Y")
# Date for folders_2
today2 = datetime.today() - timedelta(days=days_to_subtract)
today2 = today2.strftime("%b%Y")
today2 = today2[:-4]+today2[-2:]
#print(today)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Read recipe inputs
ds = dataiku.Dataset("textiles")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = today
p.set_variables(v)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
#Specific code for a managed folder created in dataiku
managed_folder = "textiles_output"
df.to_csv(index=False).encode("utf-8")
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Write recipe outputs
content = df.to_csv(index=False).encode("utf-8")
with gzip.open( "textiles_%s.csv.gz" % dataiku.get_custom_variables()["current_month_year"], 'wt') as f:
f.write(str(content))
output_folder = dataiku.Folder(managed_folder)
output_folder.upload_stream("/" + today2 + "/" + today1 + "/" + str(f), df.to_csv(index=False))
Error:
Job failed: Error in Python process: At line 56: <class 'Exception'>: None: b"Failed to write data : <class 'sharepoint_client.SharePointClientError'> : A potentially dangerous Request.Path value was detected from the client (<)."
Operating system used: Windows 10
Hi !
According to the error message, it seems you have a "<" somewhere in the path or name of your file, which is not allowed.
I noticed that you are declaring a file handle (f, at the line starting by "with gzip.open"), and later on it seems you are building a path with it ( "/" + today2 + "/" + today1 + "/" + str(f) ) . This will print the object definition in the path, which do contains "<"...
Did you mean "/" + today2 + "/" + today1 + "/" + str(f.name) ?
Regards,
Alex
Hi !
According to the error message, it seems you have a "<" somewhere in the path or name of your file, which is not allowed.
I noticed that you are declaring a file handle (f, at the line starting by "with gzip.open"), and later on it seems you are building a path with it ( "/" + today2 + "/" + today1 + "/" + str(f) ) . This will print the object definition in the path, which do contains "<"...
Did you mean "/" + today2 + "/" + today1 + "/" + str(f.name) ?
Regards,
Alex
Hi AlexB,
That does solve the problem with the error message. Thanks!
Hi everyone,
I have a similar use case. I am trying to compress a CSV and upload to sharepoint. However, I find that csv file is not actually compressed - also, when I look at the preview of sharepoint folder in dataiku - it shows the following error.
I used the code similar to what has been shared in the post and solution. Any suggestions are welcome.
I use enterprise version - Dataiku 9.0.3. Is there a limit on size of the file that could be uploaded to sharepoint through dataiku?
Hi !
Can you share with us the code you used to do the file compression and upload ? (and a screenshot of the overall flow just for context)
Hi Alex,
Thank you for your response. My flow looks like the following image:
The code I used is
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import io
import dataikuapi
from datetime import datetime
import gzip
# Read recipe inputs
input_data = dataiku.Dataset("input_data")
input_data_df = input_data.get_dataframe()
content = input_data_df.to_csv(index=False).encode("utf-8")
# compress the file using gzip
with gzip.open("result_%s.csv.gz" % dataiku.get_custom_variables()["run_date"], 'wt') as f:
f.write(str(content))
output_folder = dataiku.Folder("dataiku_generated_folder_name")
output_folder.upload_stream(str(f.name), content)
This modified version seems to work on my side:
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import io
import dataikuapi
from datetime import datetime
import gzip
input_data = dataiku.Dataset("input_data")
input_data_df = input_data.get_dataframe()
content = input_data_df.to_csv(index=False).encode("utf-8")
compressed_content = gzip.compress(content)
output_folder = dataiku.Folder("output_data")
output_folder.upload_stream("filename.csv.gz", compressed_content)
Let me know if it helps
Alex
thank you, Alex - this worked for me.