Do you know the History of Data Science? READ MORE

Is there a way to achieve this excel formula using any of the Dataiku Recipes?

Solved!
RanjithJose
Level 1
Is there a way to achieve this excel formula using any of the Dataiku Recipes?

Hi,

Here is a scenario where in i need to have variable which can store count of number of 1's in column A.

Example: Lets say Column A has below values and i need to have variable which can store count of number of 1's from Column A.

Output: Variable C = 3

1

1

1

0

0

in Excel we use =COUNTIF(OFFSET(AR:AR,3,0,ROWS(AR:AR)-3),1)

 

Thanks!

0 Kudos
1 Solution
VitaliyD
Dataiker
Dataiker

Hi @RanjithJose,

If I understand your use case correctly, you don't need a recipe for this. You get what you need by using Metrics in a dataset. Open dataset (screenshot 1) > select "Status" tab (screenshot 2) and create a new python probe metric (screenshot 2) > enable metric you created (screenshot 3). Please refer to screenshots and sample code below:

1.Screenshot 2021-06-17 at 08.44.30.png

2.Screenshot 2021-06-17 at 08.40.53.png

3.Screenshot 2021-06-17 at 08.42.08.png

# Define here a function that returns the metric.
offset = 1 # index start from 0
import pandas as pd
def process(dataset, partition_id):
    # dataset is a dataiku.Dataset object
    df = dataset.get_dataframe()
    df2 = df[(df.new_column == 1) & (df.new_column.index > offset)]
    countif = len(df2)
    return {'countif' : countif}

If this not what you were looking for, please elaborate on your use case.

Best Regards,

Vitaliy

View solution in original post

0 Kudos
4 Replies
CoreyS
Community Manager
Community Manager

Hi @RanjithJose while you wait for a more detailed response, here are some resources that I think you may find when working from Excel to Dataiku DSS:

I hope this helps!

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
pvannies
Level 2
Level 2

Hi @RanjithJose,

you could use a metric on your dataset that calculates the number of 1s, or in your specific example, just calculates the sum. You can find this in the Status tab of your dataset, see image below. Once you have setup the metric, add it to the computed metrics in the Metrics tab.

If you want to use this variable throughout your project, you can set this as a project variable using a simple python script. This python script can be part of a scenario where you first compute the metric and then perform the script. Learning about scenarios is very useful, as they also provide other ways to set the project variables: Scenario steps — Dataiku DSS 9.0 documentation

Here is an example of the python script:

 

 

 

import dataiku
client = dataiku.api_client()
project = client.get_default_project()
# get the existing project variables
variables = project.get_variables()

# replace your_dataset_name 
dataset = dataiku.Dataset("your_dataset_name")
metrics = dataset.get_last_metric_values()

# set the new project variable equal to the computed metric and save it
variables['local']['Counts1ColumnA'] = metrics.get_metric_by_id('col_stats:SUM:A')['lastValues'][0].get('value')
project.set_variables(variables)

 

 

 

 

Good luck!

0 Kudos
VitaliyD
Dataiker
Dataiker

Hi @RanjithJose,

If I understand your use case correctly, you don't need a recipe for this. You get what you need by using Metrics in a dataset. Open dataset (screenshot 1) > select "Status" tab (screenshot 2) and create a new python probe metric (screenshot 2) > enable metric you created (screenshot 3). Please refer to screenshots and sample code below:

1.Screenshot 2021-06-17 at 08.44.30.png

2.Screenshot 2021-06-17 at 08.40.53.png

3.Screenshot 2021-06-17 at 08.42.08.png

# Define here a function that returns the metric.
offset = 1 # index start from 0
import pandas as pd
def process(dataset, partition_id):
    # dataset is a dataiku.Dataset object
    df = dataset.get_dataframe()
    df2 = df[(df.new_column == 1) & (df.new_column.index > offset)]
    countif = len(df2)
    return {'countif' : countif}

If this not what you were looking for, please elaborate on your use case.

Best Regards,

Vitaliy

View solution in original post

0 Kudos
RanjithJose
Level 1
Author

Thanks Vitaliy D!

0 Kudos
A banner prompting to get Dataiku DSS