Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a csv that contains 2 datasets arranged vertically (one below the other) in it -
After parsing these 2 datasets using prepare recipe, they need to be joined together.
However, there is no common key between these 2 datasets.
One way is to enrich these 2 datasets during prepare recipe step with the csv filename and then join the 2 datasets using this filename as the key.
I am unable to find any option in DSS that can help identify/ extract the uploaded file's name.
In a prepare recipe you should be able to use: Misc > Enrich record with context information. Where you can add the filename and join based on that.
Please note there could some limitations for other file types besides txt or csv.
Let me know if this would work for you.
If you are unable to upgrade.
One possible suggestion would be to use a managed folder to upload all your files to. Use a python recipe to add the file name and output to another managed folder from which you can build create your datasets.
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import os input_folder = dataiku.Folder("PAcVjikK") paths = input_folder.list_paths_in_partition() output_folder = dataiku.Folder("MLpqB40C") # Iterate through files, check if they fit certain regex condition, and write them to output managed folders accordingly. x=0 for paths[x] in paths: with input_folder.get_download_stream(paths[x]) as f: data = pd.read_csv(f) filename= paths[x][1:] print(filename) data['filename_column'] = filename print(data) output_folder.upload_stream(filename, data.to_csv(index=False).encode("utf-8")) x +=1