Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I have a column with strings where each cell has several categories separated by a comma but in no particular order.
I would like to separate them and give each categorie a new column.
for example:
this cells -
1. "A, B, C"
2. "C, B"
will convert to 3 diffrent columns named A, B and C and row 1 will get the values 1,1,1 and row 2 will get 0,1,1
Is it possible?
Thanks!
Hi @ortrsa,
You can use a Python recipe to separate the strings from the column and then create new columns. The code below shows how to achieve this:
import dataiku
import pandas as pd, numpy as np
# Read recipe inputs
input = dataiku.Dataset("input")
df = input.get_dataframe()
for i in range(len(df)) : #iterate over all rows
try:
cols = df.loc[i, "input_column"].split(",") # split value by , to get columns
for col in cols: # for each detected columns
if not col in df: # create new column and set 0 to all rows
df[col]=0
df.loc[i,col]=1 # for current column of current row set 1
except Exception as e: print(e)
# Write recipe outputs
output = dataiku.Dataset("output")
output.write_with_schema(df)
After running this code the output dataset contains the new columns (A, B, C) with the corresponding values.