How may I select a column based off position?

Solved!
Rickh008
Level 3
How may I select a column based off position?

How may I select a column based off position within a recipe?

Is there a way to select a column using formula language [e.g. something like Column1 rather than val("column_name")]?

Is there a way to select a column using another type of step within a prepare recipe?

Thanks

1 Solution
ZachM
Dataiker

Hi @Rickh008 ,

You can accomplish this by using a Python recipe.

The following example will add a new column to the dataset that contains the value of the 5th column (index 4):

import dataiku


COLUMN_NAME = "nth_column"
"""Name of the column that will be created"""
COLUMN_INDEX = 4
"""Index (position) of the column whose value you want to copy

The index starts from 0, so the 2nd column has an index of 1
"""

# Read recipe inputs
input_dataset = dataiku.Dataset("INPUT_DATASET")
dataframe = input_dataset.get_dataframe()

# Create a new column where the value is the value of the nth column
dataframe[COLUMN_NAME] = dataframe.iloc[:, COLUMN_INDEX]

# Write recipe outputs
output_dataset = dataiku.Dataset("OUTPUT_DATASET")
output_dataset.write_with_schema(dataframe)

 You can change the position of the column that is selected by changing the COLUMN_INDEX variable.

Once the new column is created, you can then use it in any downstream recipes.

 

Thanks,

Zach

View solution in original post

1 Reply
ZachM
Dataiker

Hi @Rickh008 ,

You can accomplish this by using a Python recipe.

The following example will add a new column to the dataset that contains the value of the 5th column (index 4):

import dataiku


COLUMN_NAME = "nth_column"
"""Name of the column that will be created"""
COLUMN_INDEX = 4
"""Index (position) of the column whose value you want to copy

The index starts from 0, so the 2nd column has an index of 1
"""

# Read recipe inputs
input_dataset = dataiku.Dataset("INPUT_DATASET")
dataframe = input_dataset.get_dataframe()

# Create a new column where the value is the value of the nth column
dataframe[COLUMN_NAME] = dataframe.iloc[:, COLUMN_INDEX]

# Write recipe outputs
output_dataset = dataiku.Dataset("OUTPUT_DATASET")
output_dataset.write_with_schema(dataframe)

 You can change the position of the column that is selected by changing the COLUMN_INDEX variable.

Once the new column is created, you can then use it in any downstream recipes.

 

Thanks,

Zach

Labels

?
Labels (1)

Setup info

?
A banner prompting to get Dataiku