Compressing CSV to GZIP and writing to SharePoint

Solved!
Rose19
Level 1
Compressing CSV to GZIP and writing to SharePoint

Hi,

I'm getting errors while running this code in trying to compress a csv file to gzip and then uploading it to SharePoint using Python Code Recipe. Could someone guide me on how to accomplish this? 

The code as per below:

# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd, numpy as np
import dataikuapi
from datetime import datetime
import gzip

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
from datetime import datetime, timedelta

# Date for files
days_to_subtract = 2
today = datetime.today() - timedelta(days=days_to_subtract)
today = today.strftime("%Y_%m_%d")

# Date for folders_1
today1 = datetime.today() - timedelta(days=days_to_subtract)
today1 = today1.strftime("%d%b%Y")

# Date for folders_2
today2 = datetime.today() - timedelta(days=days_to_subtract)
today2 = today2.strftime("%b%Y")
today2 = today2[:-4]+today2[-2:]

#print(today)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE

# Read recipe inputs
ds = dataiku.Dataset("textiles")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = today
p.set_variables(v)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE

#Specific code for a managed folder created in dataiku

managed_folder = "textiles_output"
df.to_csv(index=False).encode("utf-8")

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE

# Write recipe outputs

content = df.to_csv(index=False).encode("utf-8")
with gzip.open( "textiles_%s.csv.gz" % dataiku.get_custom_variables()["current_month_year"], 'wt') as f:
f.write(str(content))

output_folder = dataiku.Folder(managed_folder)
output_folder.upload_stream("/" + today2 + "/" + today1 + "/" + str(f),  df.to_csv(index=False))

Error: 

Job failed: Error in Python process: At line 56: <class 'Exception'>: None: b"Failed to write data : <class 'sharepoint_client.SharePointClientError'> : A potentially dangerous Request.Path value was detected from the client (&lt;)."


Operating system used: Windows 10

0 Kudos
1 Solution
AlexB
Dataiker

Hi !

According to the error message, it seems you have a "<" somewhere in the path or name of your file, which is not allowed.

I noticed that you are declaring a file handle (f, at the line starting by "with gzip.open"), and later on it seems you are building a path with it ( "/" + today2 + "/" + today1 + "/" + str(f) ) . This will print the object definition in the path, which do contains "<"...

Did you mean "/" + today2 + "/" + today1 + "/" + str(f.name) ?

Regards,

Alex

View solution in original post

7 Replies
AlexB
Dataiker

Hi !

According to the error message, it seems you have a "<" somewhere in the path or name of your file, which is not allowed.

I noticed that you are declaring a file handle (f, at the line starting by "with gzip.open"), and later on it seems you are building a path with it ( "/" + today2 + "/" + today1 + "/" + str(f) ) . This will print the object definition in the path, which do contains "<"...

Did you mean "/" + today2 + "/" + today1 + "/" + str(f.name) ?

Regards,

Alex

Rose19
Level 1
Author

Hi AlexB, 

That does solve the problem with the error message. Thanks!

0 Kudos
Danny78
Level 2

Hi everyone,

I have a similar use case. I am trying to compress a CSV and upload to sharepoint. However, I find that csv file is not actually compressed - also, when I look at the preview of sharepoint folder in dataiku - it shows the following error. Screen Shot 2022-11-29 at 9.54.38 AM.png

I used the code similar to what has been shared in the post and solution. Any suggestions are welcome.

I use enterprise version - Dataiku 9.0.3. Is there a limit on size of the file that could be uploaded to sharepoint through dataiku?

0 Kudos
AlexB
Dataiker

Hi !

Can you share with us the code you used to do the file compression and upload ? (and a screenshot of the overall flow just for context)

0 Kudos
Danny78
Level 2

Hi Alex,

Thank you for your response. My flow looks like the following image:

Screen Shot 2022-11-29 at 11.53.26 AM.png

The code I used is

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

import io
import dataikuapi
from datetime import datetime
import gzip

# Read recipe inputs
input_data = dataiku.Dataset("input_data")
input_data_df = input_data.get_dataframe()

content = input_data_df.to_csv(index=False).encode("utf-8")

# compress the file using gzip
with gzip.open("result_%s.csv.gz" % dataiku.get_custom_variables()["run_date"], 'wt') as f:
f.write(str(content))

output_folder = dataiku.Folder("dataiku_generated_folder_name")
output_folder.upload_stream(str(f.name), content)
0 Kudos
AlexB
Dataiker

This modified version seems to work on my side:

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

import io
import dataikuapi
from datetime import datetime
import gzip

input_data = dataiku.Dataset("input_data")
input_data_df = input_data.get_dataframe()

content = input_data_df.to_csv(index=False).encode("utf-8")

compressed_content = gzip.compress(content)

output_folder = dataiku.Folder("output_data")
output_folder.upload_stream("filename.csv.gz", compressed_content)

Let me know if it helps

Alex

Danny78
Level 2

thank you, Alex - this worked for me.

0 Kudos