Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
While running a python code recipe, I'm getting Class exception error:
How to rectify the same.
(Topic title edited by moderator to be more descriptive. Original title "Using Dataiku")
Sharing the code snippet:
dump_time_sync_export_rop = dataiku.Dataset("dump_sync")
dump_time_sync_export_rop.read_partitions = [run]
dump_df = dump_time_sync_export_rop.get_dataframe() GETTING ERROR AT THIS LINE
Exception: Failed to read dataset stream data: b"Path does not exist in the dataset:
Looks like you have a partitioned dataset and are running this in a recipe correct?
The code would not work in recipe because read_partitions is automatically filled by the recipe and partitions you select when running the recipe.
Try removing (read_partitions = [run]) and re-run the recipe.
If you do need to use read_partitions in the actual recipe then please have a look at:
You would need to add ignore_flow:
dump_time_sync_export_rop = dataiku.Dataset("dump_sync",ignore_flow=True )
Will try this, however, I want the dataset to be partitioned with RunID. So, before running this recipe (with read_partitions = [run] line removed), do I need to create a scenario for partitioning the dataset?
Can you please revisit this section of the code:
I'm getting a key error:
surface_df5 = pd.merge_asof(surface_df, surface_df5.reset_index(drop=True)[['HDTH', 'ROP_5']], left_on='HDTH_monotonic', right_on='HDTH', direction='forward')#.dropna(subset=['HDTH_y'])
if surface_df['ROP5'].dropna().empty: AT THIS LINE
surface_df['ROP5'] = surface_df5['ROP5'].values
surface_df.loc[surface_df['ROP'].isna(), 'ROP5'] = np.nan
The key error indicates that the column name does not exist in your dataframe.
I see you may have mismatched the column names with ROP_5 vs ROP? I suggest you print your df in a notebook before the line that fails and see exactly what column names you have.
I figured it out that ROP 5 is not there in the dataset. So for that can I make ROP5=ROP, since ROP channel is there in my dataset.
OR add a condition if it is not there : I may skip it?
The error indicates you are more memory than what is available in your cgroup configuration or the kernel is killing the process as it's using too much memory. You can try to reduce the memory usage of script by using chunked reading https://doc.dataiku.com/dss/latest/python-api/datasets-data.html
Or increase the memory available on the DSS instance or Cgroups.