How to calculate Percentile on multiple columns

AZS1976
Level 2
How to calculate Percentile on multiple columns

Hi Everyone,

I am in a process of moving my workflow from Alteryx to Dataiku. I am looking to resolve the percentile calculation for multiple columns. for example, As can be seen from the attached file, that I want to calculate the 10th percentile (as in columns H, I, J, and K) for four metrics (in columns D, E, F, and G).  

Thank you 


Operating system used: Windows

0 Kudos
4 Replies
CatalinaS
Dataiker

Hi @AZS1976,

 

You could use a Python recipe with Pandas library that contains this function to calculate any quantile https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html#pandas-dataframe-quanti...

Below is an example Python code on how to calculate 10th percentile on more columns and write the results on an output dataset:

import dataiku
import pandas as pd

# Read recipe inputs
input = dataiku.Dataset("input_data")
input_df =input.get_dataframe()

new_row=input_df.quantile(0.1)

output_df=pd.DataFrame()
output_df=output_df.append(new_row, ignore_index=True)

# Write recipe outputs
output = dataiku.Dataset("output")
output.write_with_schema(output_df)

 

0 Kudos
AZS1976
Level 2
Author

Hi @CatalinaS

Thank you for a quick and apt solution. I will try to run this python recipe.

Best regards

 

0 Kudos
JordanB
Dataiker

Hi @AZS1976,

You can also use the quantile processor in a Prepare recipe to get the 10th percentile.

Screen Shot 2022-12-08 at 3.02.38 PM.png

 

Thanks!

Jordan

โ€ƒ

0 Kudos
AZS1976
Level 2
Author

Hi @CatalinaS 

The Python recipe yields a single row with 10th percentile for every column. Can you please also help explain how to get to the 10th percentile for Hierarchy 2 level. e.g. if

hierarchy 1 is 'Continent' 

Hierarchy 2 is 'Country'

Hierarchy 3 is 'City'

and I have to calculate and populate the data with Heirachy 2, level 10th percentile.  

 

Thanks

 

0 Kudos