File existence check

Options
MSL
MSL Registered Posts: 7

Hi folks, I am trying to check the existence of a file in a particular path using python code in dataiku.

I am able to access the file manually but when I am trying to check the existence it is not giving me the expected result. The code should return -1 when the file exists and 0 when it is not.

Thanks in advance for the inputs.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,740 Neuron
    Options

    Hi, there might be a better way to achieve what you want. What exactly is your goal? Do you want to refresh a flow when a file arrives or changed in a Dataiku Managed Folder? Also please post your code snippet using a code block (the </> icon in the toolbar).

  • MSL
    MSL Registered Posts: 7
    Options

    Hi this is the code I am using where it checks the file existence and updates the value accordingly

    import pandas as pd
    import os

    # Load the dataset containing file paths
    df = dataiku.Dataset("formula_25").get_dataframe()

    # List of column names containing file paths
    file_columns = ['file1', 'file2', 'file3', 'file4']

    # Function to check file existence
    def check_file_existence(file_path):
    return -1 if os.path.exists(file_path) else 0

    # Iterate over each file column
    for col in file_columns:
    # Create a new column to store existence status
    existence_col = col + "_exists"
    # Check file existence for each row in the column
    df[existence_col] = df[col].apply(check_file_existence)

    # Save the updated DataFrame back to the result dataset
    dataiku.Dataset("file_exist").write_with_schema(df)

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,740 Neuron
    Options

    Please post your code snippet using a code block (the </> icon in the toolbar). If you don't then the padding is lost and the code can't be executed when you copy/paste it as Python is strict about padding.

    With regards to you issue you can't access the file system directly, you need to use a Dataiku Managed Folder:

    https://knowledge.dataiku.com/latest/code/managed-folders/concept-managed-folders.html

Setup Info
    Tags
      Help me…