Mentioning the column value in the if condition output along with the string in the prep recipe

Tsurapaneni
Level 3
Mentioning the column value in the if condition output along with the string in the prep recipe

Hi Team,

I have a use case where within the condition statement i have to mention the column value in between the string value. Giving an example:

if(and(B >1), 'B column is (column value from the below dataset (2)) times the A column', B)

The output of the dataset should look this way which is the "D" column value should be a mix of both string and the value from column B.

NOTE: Engine is "InSQL database" not the local stream. kindly consider this for the replies.

A.  B.   C.  D

1.   2.   str. B column is 2 times the A column

 

Thanks !

0 Kudos
6 Replies
Ignacio_Toledo

Hi @Tsurapaneni

Not sure if I understand your use case completely, but maybe this is what you are looking for?:

if(B > 1, concat('B column is ', B, ' times the A Column'), B)

Hope it helps!

0 Kudos
Tsurapaneni
Level 3
Author

Hi @Ignacio_Toledo ,

Thank you for this reply, it worked for me ! I have one more question on the count of the dss dataset. Is there any way I can get the count of the total number of rows in the dataset through a formula in the prep recipe ?

For now I am using the group by recipe to get the count but I would like to create a column storing all the count of the rows in the dataset. Please let me know if we can achieve it through the formula processor in the prep recipe or by any other processors.

NOTE: I am using the "Insql database" engine not the local one. 

 

Regards !

 

 

Happy to help @Tsurapaneni!

Sadly I'm not aware of a method to add a column with the number of rows in the dataset using the insql database engine with the prepare recipe. Maybe someone else have a hint?

Tsurapaneni
Level 3
Author

Hi @Ignacio_Toledo ,

May be any way where we can specify the row value based on other column should also work for my use case for now. if at all the prep recipe wouldn't help me to add a column which has the count of the records in it. 

eg for my ask: consider there are 3 columns and should create a new column. 

A          B    C  
red       1    0
blue     3     1

I would like to retrieve the value from the column A whose column Value B is highest 

Steps like:

1. Getting the highest value for the column 'B' (which is 3)

2.  Retrieving the value from the 'A' column whose Column B value is the highest (which is blue)

Highest of B column is '3' and the concerned 'A' column is blue so the column D has the value 'blue' throughout it's column (as the max of B column value is 3 and to that row A column has the value 'blue').

Please let me know if we can make this happen through any recipes in dataiku. 

Thanks !

 

0 Kudos
AgatheG
Dataiker

Hi Tsurapaneni,

If I understand your need correctly, an easy way to go would be via a Python recipe.

For instance, you could append the column D like this:

 

dataset = dataiku.Dataset(YOUR_DATASET_NAME)
df = dataset.get_dataframe()

i = df.B.idxmax()
df["D"] = df.A[i]

 

 

Please note that idxmax returns the index of the first row in your dataset where the column B has its highest value. So if the highest value in column B is present on multiple rows, the first value at column A where B is at its highest value will be taken to create column D.

Meaning if you have:

A       B  C
red    1   0
blue   3  1
green 3  0

Column D would be filled with 'blue' (not 'green'). Depending on your dataset, this may (or may not) matter.

 

Hope this helps!

 

Agathe

0 Kudos
Tsurapaneni
Level 3
Author

Hi,

 

Thank you for the response! I am looking for the solution using the prep recipe but not in the python code recipes. 

0 Kudos