How may I select a column based off position?

Rickh008
Rickh008 Dataiku DSS Core Designer, Registered Posts: 15 ✭✭✭✭

How may I select a column based off position within a recipe?

Is there a way to select a column using formula language [e.g. something like Column1 rather than val("column_name")]?

Is there a way to select a column using another type of step within a prepare recipe?

Thanks

Tagged:

Best Answer

  • Zach
    Zach Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 153 Dataiker
    edited July 17 Answer ✓

    Hi @Rickh008
    ,

    You can accomplish this by using a Python recipe.

    The following example will add a new column to the dataset that contains the value of the 5th column (index 4):

    import dataiku
    
    
    COLUMN_NAME = "nth_column"
    """Name of the column that will be created"""
    COLUMN_INDEX = 4
    """Index (position) of the column whose value you want to copy
    
    The index starts from 0, so the 2nd column has an index of 1
    """
    
    # Read recipe inputs
    input_dataset = dataiku.Dataset("INPUT_DATASET")
    dataframe = input_dataset.get_dataframe()
    
    # Create a new column where the value is the value of the nth column
    dataframe[COLUMN_NAME] = dataframe.iloc[:, COLUMN_INDEX]
    
    # Write recipe outputs
    output_dataset = dataiku.Dataset("OUTPUT_DATASET")
    output_dataset.write_with_schema(dataframe)

    You can change the position of the column that is selected by changing the COLUMN_INDEX variable.

    Once the new column is created, you can then use it in any downstream recipes.

    Thanks,

    Zach

Setup Info
    Tags
      Help me…