How may I select a column based off position?

Tags
Dataiku DSS Core Designer, Registered Posts: 15 ✭✭✭✭

How may I select a column based off position within a recipe?

Is there a way to select a column using formula language [e.g. something like Column1 rather than val("column_name")]?

Is there a way to select a column using another type of step within a prepare recipe?

Thanks

Best Answer

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    edited July 2024 Answer ✓

    Hi @Rickh008
    ,

    You can accomplish this by using a Python recipe.

    The following example will add a new column to the dataset that contains the value of the 5th column (index 4):

    import dataiku
    
    
    COLUMN_NAME = "nth_column"
    """Name of the column that will be created"""
    COLUMN_INDEX = 4
    """Index (position) of the column whose value you want to copy
    
    The index starts from 0, so the 2nd column has an index of 1
    """
    
    # Read recipe inputs
    input_dataset = dataiku.Dataset("INPUT_DATASET")
    dataframe = input_dataset.get_dataframe()
    
    # Create a new column where the value is the value of the nth column
    dataframe[COLUMN_NAME] = dataframe.iloc[:, COLUMN_INDEX]
    
    # Write recipe outputs
    output_dataset = dataiku.Dataset("OUTPUT_DATASET")
    output_dataset.write_with_schema(dataframe)

    You can change the position of the column that is selected by changing the COLUMN_INDEX variable.

    Once the new column is created, you can then use it in any downstream recipes.

    Thanks,

    Zach

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.