Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on May 10, 2017 7:19PM
Likes: 0
Replies: 5
I have a folder with CSVs in it (by "folder" I mean the thing you get when you're doing +dataset -> Folder from the flow) . They are named "dataset_01", "dataset_02" and so on.
I'm trying to read one of them in a Python recipe. What's the code ?
I tried something like this, but it wants me to add "path_of_csv" to inputs, so it's not what I'm looking for.
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os
# Recipe inputs
folder_path = dataiku.Folder("FuShmlsH").get_path()
path_of_csv = os.path.join(folder_path, "dataset_01.csv")
my_dataset = dataiku.Dataset(path_of_csv).get_dataframe()
# Recipe outputs
test = dataiku.Dataset("test")
test.write_with_schema(my_dataset)
Thanks.
Hello,
You can only import inputs to your recipe using "dataiku.Dataset("xx").get_dataframe()"
In your case, the input is not a dataset, it's a folder! So you correctly used "dataiku.Folder("xx")" already and you're done.
Now you can just read some files from it!
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os
# Recipe inputs
folder_path = dataiku.Folder("FuShmlsH").get_path()
path_of_csv = os.path.join(folder_path, "dataset_01.csv")
my_dataset = pd.read_csv(path_of_csv)
Hi, I am trying to use the CSV file as input from the folder using python recipe
Import dataiku
Import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
Import os
#Recipe inputs
folder_path = dataiku.Folder("xx/x/x/x").get_path()
path_of_csv = os.path.join(folder_path, "xxxx.csv")
my_dataset = pd.read.csv(path_of_csv)
#Recipe outputs
df_Import = dataiku.Dataset("df_Import")
df_Import.write_with_schema(my_dataset)
my_dataset
This is giving me error in python process- Managed folder xx/x/x/x cannot be used: declare it as input or output of your recipe.
Welcome to the Dataiku Community.
This confused me for a while with Dataiku. A Managed folder in Dataiku is not exactly like a folder on disk. It is sort of a handle designed to work with a variety of data storage connections like SFTP or S3 as well as the local file system if you choose.
You have to create the managed folder first from the UI, then you can use it from your python recipe. The name for the managed folder is the name you gave the folder when you created it in DSS. Something like My_Folder. (It is not referenced by it path on the local disk.)
Then when you create your python recipe you need to connect the managed folder to your python recipe.
For example from your code segement you can use
folder_path = dataiku.Folder("xx/x/x/x").get_path()
with "xx/x/x/x" replace with the name of the managed folder that happens to be on the local file system to get the actual path to this Managed folder.
This level of indirection is designed (I think) to help abstract away some of the issues you will run into when moving a project from one node to the next.
Here is the managed folder Python API documentation.
https://doc.dataiku.com/dss/latest/python-api/managed_folders.html
However, you might find a tutorial on the subject a bit more helpful.
https://knowledge.dataiku.com/latest/courses/folders/managed-folders-hands-on.html
Here is a community thread as well.
Let us know how you are getting on with this.