Split columns to rows with new line delimiter
abalo006
Registered Posts: 29 ✭
I have data structured as such
category | category details |
1 | categoryname categorytype categorylength |
my category details cell has three values just separated by a new line, is there a way to split that field by a new line delimiter so that I get 2 new rows for those extra fields?
me desired output is
category | category details |
1 | categoryname |
1 | categorytype |
1 | categorylength |
is that possible using dataiku?
Operating system used: windows
Tagged:
Answers
-
louisbarjon Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 9 Dataiker
Hello
A python code recipe will work with the following code. Make sure to change accordingly the dataset names, here "your_dataset_name" and "cells_split_dataset".
Please also change the column name (your_column_to_split here)# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs your_dataset = dataiku.Dataset("your_dataset_name") your_dataset_df = your_dataset.get_dataframe() # Function to split the text into multiple rows def split_text_to_rows(df, column): # Create an empty DataFrame to hold the new rows new_rows = [] for index, row in df.iterrows(): text_lines = row[column].split('\n') # Split the text by \n for line in text_lines: new_row = row.copy() new_row[column] = line new_rows.append(new_row) # Create a new DataFrame from the new rows new_df = pd.DataFrame(new_rows) return new_df cells_split_df = split_text_to_rows(your_dataset_df, "your_column_to_split") # Write recipe outputs cells_split = dataiku.Dataset("cells_split_dataset") cells_split.write_with_schema(cells_split_df)
Louis