Compressing CSV to GZIP and writing to SharePoint

Options
Rose19
Rose19 Dataiku DSS Core Designer, Registered Posts: 2 ✭✭✭

Hi,

I'm getting errors while running this code in trying to compress a csv file to gzip and then uploading it to SharePoint using Python Code Recipe. Could someone guide me on how to accomplish this?

The code as per below:

# -*- coding: utf-8 -*-
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd, numpy as np
import dataikuapi
from datetime import datetime
import gzip

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
from datetime import datetime, timedelta

# Date for files
days_to_subtract = 2
today = datetime.today() - timedelta(days=days_to_subtract)
today = today.strftime("%Y_%m_%d")

# Date for folders_1
today1 = datetime.today() - timedelta(days=days_to_subtract)
today1 = today1.strftime("%d%b%Y")

# Date for folders_2
today2 = datetime.today() - timedelta(days=days_to_subtract)
today2 = today2.strftime("%b%Y")
today2 = today2[:-4]+today2[-2:]

#print(today)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE

# Read recipe inputs
ds = dataiku.Dataset("textiles")
df = ds.get_dataframe()
projectKey = dataiku.get_custom_variables()["projectKey"]
c = dataiku.api_client()
p = c.get_project(projectKey)
v = p.get_variables()
v["standard"]["current_month_year"] = today
p.set_variables(v)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE

#Specific code for a managed folder created in dataiku

managed_folder = "textiles_output"
df.to_csv(index=False).encode("utf-8")

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE

# Write recipe outputs

content = df.to_csv(index=False).encode("utf-8")
with gzip.open( "textiles_%s.csv.gz" % dataiku.get_custom_variables()["current_month_year"], 'wt') as f:
f.write(str(content))

output_folder = dataiku.Folder(managed_folder)
output_folder.upload_stream("/" + today2 + "/" + today1 + "/" + str(f), df.to_csv(index=False))

Error:

Job failed: Error in Python process: At line 56: <class 'Exception'>: None: b"Failed to write data : <class 'sharepoint_client.SharePointClientError'> : A potentially dangerous Request.Path value was detected from the client (&lt;)."


Operating system used: Windows 10

Best Answer

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Answer ✓
    Options

    Hi !

    According to the error message, it seems you have a "<" somewhere in the path or name of your file, which is not allowed.

    I noticed that you are declaring a file handle (f, at the line starting by "with gzip.open"), and later on it seems you are building a path with it ( "/" + today2 + "/" + today1 + "/" + str(f) ) . This will print the object definition in the path, which do contains "<"...

    Did you mean "/" + today2 + "/" + today1 + "/" + str(f.name) ?

    Regards,

    Alex

Answers

  • Rose19
    Rose19 Dataiku DSS Core Designer, Registered Posts: 2 ✭✭✭
    Options

    Hi AlexB,

    That does solve the problem with the error message. Thanks!

  • Danny78
    Danny78 Registered Posts: 8
    Options

    Hi everyone,

    I have a similar use case. I am trying to compress a CSV and upload to sharepoint. However, I find that csv file is not actually compressed - also, when I look at the preview of sharepoint folder in dataiku - it shows the following error. Screen Shot 2022-11-29 at 9.54.38 AM.png

    I used the code similar to what has been shared in the post and solution. Any suggestions are welcome.

    I use enterprise version - Dataiku 9.0.3. Is there a limit on size of the file that could be uploaded to sharepoint through dataiku?

  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    Options

    Hi !

    Can you share with us the code you used to do the file compression and upload ? (and a screenshot of the overall flow just for context)

  • Danny78
    Danny78 Registered Posts: 8
    edited July 17
    Options

    Hi Alex,

    Thank you for your response. My flow looks like the following image:

    Screen Shot 2022-11-29 at 11.53.26 AM.png

    The code I used is

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    import io
    import dataikuapi
    from datetime import datetime
    import gzip
    
    # Read recipe inputs
    input_data = dataiku.Dataset("input_data")
    input_data_df = input_data.get_dataframe()
    
    content = input_data_df.to_csv(index=False).encode("utf-8")
    
    # compress the file using gzip
    with gzip.open("result_%s.csv.gz" % dataiku.get_custom_variables()["run_date"], 'wt') as f:
    f.write(str(content))
    
    output_folder = dataiku.Folder("dataiku_generated_folder_name")
    output_folder.upload_stream(str(f.name), content)
  • AlexB
    AlexB Dataiker Posts: 67 Dataiker
    edited July 17
    Options

    This modified version seems to work on my side:

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    import io
    import dataikuapi
    from datetime import datetime
    import gzip
    
    input_data = dataiku.Dataset("input_data")
    input_data_df = input_data.get_dataframe()
    
    content = input_data_df.to_csv(index=False).encode("utf-8")
    
    compressed_content = gzip.compress(content)
    
    output_folder = dataiku.Folder("output_data")
    output_folder.upload_stream("filename.csv.gz", compressed_content)

    Let me know if it helps

    Alex

  • Danny78
    Danny78 Registered Posts: 8
    Options

    thank you, Alex - this worked for me.

Setup Info
    Tags
      Help me…