Split columns to rows with new line delimiter

abalo006
abalo006 Registered Posts: 29

I have data structured as such

categorycategory details
1

categoryname

categorytype

categorylength

my category details cell has three values just separated by a new line, is there a way to split that field by a new line delimiter so that I get 2 new rows for those extra fields?

me desired output is

categorycategory details

1

categoryname
1categorytype
1categorylength

is that possible using dataiku?


Operating system used: windows

Tagged:

Answers

  • louisbarjon
    louisbarjon Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 9 Dataiker
    edited July 17

    Hello

    A python code recipe will work with the following code. Make sure to change accordingly the dataset names, here "your_dataset_name" and "cells_split_dataset".
    Please also change the column name (your_column_to_split here)

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    your_dataset = dataiku.Dataset("your_dataset_name")
    your_dataset_df = your_dataset.get_dataframe()
    
    
    # Function to split the text into multiple rows
    def split_text_to_rows(df, column):
        # Create an empty DataFrame to hold the new rows
        new_rows = []
        for index, row in df.iterrows():
            text_lines = row[column].split('\n')  # Split the text by \n
            for line in text_lines:
                new_row = row.copy()
                new_row[column] = line
                new_rows.append(new_row)
        
        # Create a new DataFrame from the new rows
        new_df = pd.DataFrame(new_rows)
        return new_df
    
    
    
    cells_split_df = split_text_to_rows(your_dataset_df, "your_column_to_split")
    
    
    # Write recipe outputs
    cells_split = dataiku.Dataset("cells_split_dataset")
    cells_split.write_with_schema(cells_split_df)
    



    Louis

Setup Info
    Tags
      Help me…