Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Split columns to rows with new line delimiter

abalo006
Level 3
Split columns to rows with new line delimiter

I have data structured as such

categorycategory details
1

categoryname

categorytype

categorylength

 

my category details cell has three values just separated by a new line, is there a way to split that field by a new line delimiter so that I get 2 new rows for those extra fields?

 

me desired output is 

categorycategory details

1

categoryname
1categorytype
1categorylength

 

is that possible using dataiku?

 

 


Operating system used: windows

0 Kudos
1 Reply
louisbarjon
Dataiker

Hello

A python code recipe will work with the following code. Make sure to change accordingly the dataset names, here "your_dataset_name" and "cells_split_dataset".
Please also change the column name (your_column_to_split here)

 

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
your_dataset = dataiku.Dataset("your_dataset_name")
your_dataset_df = your_dataset.get_dataframe()


# Function to split the text into multiple rows
def split_text_to_rows(df, column):
    # Create an empty DataFrame to hold the new rows
    new_rows = []
    for index, row in df.iterrows():
        text_lines = row[column].split('\n')  # Split the text by \n
        for line in text_lines:
            new_row = row.copy()
            new_row[column] = line
            new_rows.append(new_row)
    
    # Create a new DataFrame from the new rows
    new_df = pd.DataFrame(new_rows)
    return new_df



cells_split_df = split_text_to_rows(your_dataset_df, "your_column_to_split")


# Write recipe outputs
cells_split = dataiku.Dataset("cells_split_dataset")
cells_split.write_with_schema(cells_split_df)

 

 

 

 



Louis

 

 

 

0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku