Data transformation

shahas71 · September 2023

Hi All,

This is a mock data of my input :

Input
Rule	Account	Entity	Audit
R1	a1,a2	e1
R2		e2	-c2

This has to be split and transformed to this form :

Rule	Dimension	Value
R1	Account	a1
R1	Account	a2
R1	Entity	e1
R2	Entity	e2
R2	Audit	-c2

Then have to search a databse, say "db1" and find the levels :

Rule	Dimension	Value	Level
R1	Account	a1	L2
R1	Account	a2	L3
R1	Entity	e1	L1
R2	Entity	e2	L2
R2	Audit	-c2	L3

Then for the final output, data should look like :

Table_Dim_Account	Rule	L1	L2	L3	Exclude
	R1		a1		N
	R1			a2	N
	R2				N

Table_Dim_Entity	Rule	L1	L2	L3	Exclude
	R1	e1			N
	R2		e2		N

Table_Dim_Audit	Rule	L1	L2	L3	Exclude
	R1			c2	Y
	R1				N

Exclude column is just a yes or no..if there is any negative value, exclude is set to Y or else N

how can we achieve the solution? open to both python recipe solution as well as visual recipe solutions

Thanks in advance

Operating system used: windows

info-rchitect · September 2023

@shahas71
How many unique column values will you have? Second question is, are they fixed and known in advance or are they dynamic?

shahas71 · September 2023

Hi @info-rchitect
,

Not quite sure if i get your question right. But the column, for eg, account might have 2 or 3 values eperated by a ',' delimiter. that has to be split into seperate rows. these values will be given to us & just have to find the level from seperate databse & transform it into the final form

info-rchitect · September 2023

Hi @shahas71

I would assume you don't know when CSV data will show up vs a single value in each of the three columns. This is valid Snowflake SQL that will accomplish the task:

create or replace temporary table mytable(rule varchar(50), account varchar(50), entity varchar(50), audit varchar(50));

insert into mytable values
    ('R1', 'a1,a2', 'e1', null),
    ('R2', null, 'e2', '-c2');
    
with account_data as (
    select distinct rule,
        split(account, ',') as account_array
    from mytable
),
account_data_flattened as (
    select rule,
        'Account' as dimension,
        to_char(a1.value) as value
    from account_data, table(flatten(account_data.account_array)) a1
),
entity_data as (
    select distinct rule, split(entity, ',') as entity_array
    from mytable 
),
entity_data_flattened as (
    select rule,
        'Entity' as dimension,
        to_char(a1.value) as value
    from entity_data, table(flatten(entity_data.entity_array)) a1
),
audit_data as (
    select distinct rule, split(audit, ',') as audit_array
    from mytable 
),
audit_data_flattened as (
    select rule,
        'Audit' as dimension,
        to_char(a1.value) as value
    from audit_data, table(flatten(audit_data.audit_array)) a1
)
select * from account_data_flattened
union all
select * from entity_data_flattened
union all
select * from audit_data_flattened;

info-rchitect · September 2023

I take that back, this is much better:

with unpivoted as (
    select * from mytable
    unpivot(value for dimension in (Account, Entity, Audit))
),
formatted as (
    select rule,
        dimension,
        split(value, ',') as value_array
    from unpivoted
)
select rule,
    dimension,
    to_char(a1.value) as value
from formatted, table(flatten(formatted.value_array)) a1;

info-rchitect · September 2023

and even better:

with unpivoted as (
    select * from mytable
    unpivot(value for dimension in (Account, Entity, Audit))
)
select rule, dimension, records.value::varchar as value
from unpivoted, lateral flatten (input => split(value, ','), outer => true) records;

shahas71 · September 2023

@info-rchitect
Thanks for the reply.

let me make it more clear

Input
Rule	Account	Entity	Audit
R1	a1,a2	e1
R2		e2	-c2

from this table, we have to transform the table into this form first:

Rule	Account	Entity	Audit
R1	a1	e1
R1	a2	e1
R2		e2	-c2

after that, the table needs to expanded. all the columns, except Rule (this is just a mock data, in reality many columns will be present) will become under a new columns called Dimension, ie, dimension will contain account, entity, audit etc. The values a1,a2,e1 etc will come under new column named as value. so the output has to be like this :

Rule	Dimension	Value
R1	Account	a1
R1	Account	a2
R1	Entity	e1
R2	Entity	e2
R2	Audit	-c2

Hope its clear

info-rchitect · September 2023

@shahas71
So the middle table will actually be used and people/processes will select data from it? Also, do you only want these manipulations done in Python and not on the database itself? We can use Python to manipulate the database if needed (what I typically prefer).

One more question: Why dos the middle table have to exist before the final table? It is much easier to use the SQL I posted above to make the final table first and then make the middle table from the final table.

Data transformation

Answers

Categories

Setup Info

Tags