Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am in a process of moving my workflow from Alteryx to Dataiku. I am looking to resolve the percentile calculation for multiple columns. for example, As can be seen from the attached file, that I want to calculate the 10th percentile (as in columns H, I, J, and K) for four metrics (in columns D, E, F, and G).
Operating system used: Windows
You could use a Python recipe with Pandas library that contains this function to calculate any quantile https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html#pandas-dataframe-quanti...
Below is an example Python code on how to calculate 10th percentile on more columns and write the results on an output dataset:
import dataiku import pandas as pd # Read recipe inputs input = dataiku.Dataset("input_data") input_df =input.get_dataframe() new_row=input_df.quantile(0.1) output_df=pd.DataFrame() output_df=output_df.append(new_row, ignore_index=True) # Write recipe outputs output = dataiku.Dataset("output") output.write_with_schema(output_df)
The Python recipe yields a single row with 10th percentile for every column. Can you please also help explain how to get to the 10th percentile for Hierarchy 2 level. e.g. if
hierarchy 1 is 'Continent'
Hierarchy 2 is 'Country'
Hierarchy 3 is 'City'
and I have to calculate and populate the data with Heirachy 2, level 10th percentile.