DSS Managed Folder Issues

Kay
Kay Registered Posts: 3
edited July 16 in Using Dataiku

I get the following error when I run the python below in DSS

Error: Job failed: Error in Python process: At line 14: <class 'AttributeError'>: 'DSSManagedFolder' object has no attribute 'list_objects'

See actual Python script below

import dataiku
import pandas as pd

client = dataiku.api_client()
project = client.get_project('PREDICTIVE_ANALYTICS_OF_RAW_MATERIALS_1')

source_folder_name = "qQNv9CBS"
target_folder_name = "HMAn88LX"

source_folder = project.get_managed_folder(source_folder_name)
target_folder = project.get_managed_folder(target_folder_name)

# List all paths (files and folders) in the source folder
source_paths = source_folder.list_objects()

# Part numbers to filter
part_numbers = ['100061', '6000303', '6004910', '6000662', '6002238', '6002963', '6002965', '6004488']

# Iterate through the paths in the source folder
for source_path in source_paths:
    # Check if the path is a file
    if source_path['type'] == 'File':
        # Use the dataiku.Dataset class to read the content of the file
        source_dataset = dataiku.Dataset(source_folder.get_path() + '/' + source_path['name'])
        df = source_dataset.get_dataframe()

        # Process the data as needed
        df['Material'] = df['Material'].astype(str)

        # Check if 'Material' column exists in the DataFrame
        if 'Material' in df.columns:
            # Filter rows with specific part numbers in 'Material' column
            filtered_df = df[df['Material'].isin(part_numbers)]

            # Check if any rows match the criteria
            if not filtered_df.empty:
                # Define the target path
                target_path = target_folder.get_path() + '/' + source_path['name']

                # Write the filtered DataFrame to the target folder
                target_dataset = dataiku.Dataset(target_path)
                target_dataset.write_with_schema(filtered_df, dropAndCreate=True)

# Copy the entire folder to the output folder
future = source_folder.copy_to(target_folder)
future.wait_for_result()


Operating system used: Windows 10

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    Let me guess, ChatGPT wrote that code snippet. You get object has no attribute 'list_objects' because there is no method called list_object(). Your GenAI is having an hallucination. So what exactly are you trying to do?

  • Kay
    Kay Registered Posts: 3

    ChatGPT wrote part of it
    I'm trying to read the files in the source folder

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    Please do not paste GenAI code without explicitly warning people your code was generated by a GenAI bot. It's one thing to try to understand and fix someone else's code and completely different thing is to do the same with GenAI code. And if you are posting GenAI code please include the prompt you used to generate it, so people looking at the code can try to understand what it was asked to do.

    In this example you can real Dataiku Python API code in which files are read from one folder and copied to another one:

    https://community.dataiku.com/t5/Using-Dataiku/Listing-and-Reading-all-the-files-in-a-Managed-Folder/m-p/8140

    If you need further help please explain clearly your requirement. "read the files in the source folder" doesn't say much and your GenAI code seems to be doing much more than that.

  • CH007
    CH007 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 5

    Hi there, I think you’re stuck trying to connect to a managed folder, list the file contents of the managed folder and then upload a particular file into your Notebook instance and perform some custom filters, etc. Below I've included some sample code that you may have to tweak for your actual project, but it'll point you in the direction to resolve the error you have with connecting to your managed folder.

    <class ‘Attribute Error’>: object has no attribute - Whenever you see “object has no attribute”, this occurs when you try to access an attribute or method of an object that doesn’t exist. When working with managed folders in Dataiku you have to retrieve its handle first and then you can manipulate the object.

    Your code is missing the handle to manipulate the file, instead you are just leveraging the source folder name. You also need to specify the partitions of the folder as well to list the contents.

    code_2.PNG

    Additional documentation can be found below for handling managed folders

    https://doc.dataiku.com/dss/latest/connecting/managed_folders.html

Setup Info
    Tags
      Help me…