Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on August 20, 2019 9:28PM
Likes: 1
Replies: 8
I have created a series of folders to store uploaded files (.csv). This folder is connected to a python recipe that connects to the folder, and uses a read_csv loop to read in each file and append it to a dataframe. The dataframe then outputs to a dataset. This all works fine, but the folder is sporadically deleting all of the csv files, usually within -2 days of uploading the files. Functionally, this prevents me from using the Flow r scenarios on the Flow or parts thereof, as the first steps of importing the data fails as there is nothing in the folders after the folders delete the data. Has anyone else experienced this file deletion behavior?
read_csv recipe, if it is relevant:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
raw_data_handle = dataiku.Folder("2XlQv9z4")
raw_data_handle_info = raw_data_handle.get_info()
paths = raw_data_handle.list_paths_in_partition()
raw_data = pd.DataFrame()
for i in (paths):
with raw_data_handle.get_download_stream(i) as f:
new_data = pd.read_csv(f, header = 0) # Call read_csv on the object
raw_data = pd.concat([raw_data, new_data])
# Write recipe outputs
int_20190501 = dataiku.Dataset("int_20190501")
int_20190501.write_with_schema(raw_data)