Compressing CSV and uploading to sharepoint

Solved!
Danny78
Level 2
Compressing CSV and uploading to sharepoint

Hi everyone,

I am trying to compress a CSV and upload to sharepoint. However, I find that csv file is not actually compressed - also, when I look at the preview of sharepoint folder in dataiku - it shows the following error.

Screen Shot 2022-11-29 at 9.54.38 AM.png


I used the code similar to what has been shared in the post and solution. Any suggestions are welcome.

I use enterprise version - Dataiku 9.0.3. Is there a limit on size of the file that could be uploaded to sharepoint through dataiku?

0 Kudos
1 Solution
AlexB
Dataiker

Double posting the answer here for future reference:

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

import io
import dataikuapi
from datetime import datetime
import gzip

input_data = dataiku.Dataset("input_data")
input_data_df = input_data.get_dataframe()

content = input_data_df.to_csv(index=False).encode("utf-8")

compressed_content = gzip.compress(content)

output_folder = dataiku.Folder("output_data")
output_folder.upload_stream("filename.csv.gz", compressed_content)

View solution in original post

1 Reply
AlexB
Dataiker

Double posting the answer here for future reference:

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

import io
import dataikuapi
from datetime import datetime
import gzip

input_data = dataiku.Dataset("input_data")
input_data_df = input_data.get_dataframe()

content = input_data_df.to_csv(index=False).encode("utf-8")

compressed_content = gzip.compress(content)

output_folder = dataiku.Folder("output_data")
output_folder.upload_stream("filename.csv.gz", compressed_content)