Append CSV
Hello all,
I am new to DataIku, appreciate any support I can get .
I have 3 .csv files that I load in a Dataiku folder : +Dataset->Folder. The files have the same schema.
I want to append them in dataiku for a consolidated final output. The file names have the same prefix (first 3 chars). when using the python recipe, I don't know how to point it to loop in the folder over each .csv and append it to the final output. I
This is what I tried, throwing an error:
Thank you!
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import glob # Read recipe inputs test_Mih = dataiku.Folder("V82hHFhe") test_Mih_info = test_Mih.get_info() # Compute recipe outputs # TODO: Write here your actual code that computes the outputs # all_files = glob.glob(test_Mih + "/*.csv") #all_files = sorted(glob('test_Mih/30P*.csv')) all_files = test_Mih.list_paths_in_partition() li = [] for filename in all_files: df = pd.read_csv(filename, index_col=None, header=0) li.append(df) frame = pd.concat(li, axis=0, ignore_index=True) # NB: DSS supports several kinds of APIs for reading and writing data. Please see doc. final_test_Mih_df = frame # Write recipe outputs final_test_Mih = dataiku.Dataset("Final_test_Mih") final_test_Mih.write_with_schema(final_test_Mih_df)
Best Answer
-
if it's CSV, you can add the column names in the dataset's Schema. DSS reads the CSV by position, so as long as the field order is consistent, it should be fine. Then you can export the dataset to CSV to get all the data in one chunk (with headers if you want)
Answers
-
Hi,
if the 3 files have the same schema, then you can do a FilesInFolder dataset on your folder (from the folder's actions tab), Show Advanced options in the dataset's Settings > Connection and filter to select only the 3 files. Then you can access the dataset and get it as a single dataframe with the usual dataiku.Dataset(...).get_dataframe()
-
by schema I mean the same structure but in fact they come without headers, that's something I would setup on the output..:(
-
Thanks a lot, it worked !