count rows with condition
Hi,
I want to count the number of rows with a condition. And after i want to put i as a projetc variable
There are my dataset with my code:
Dataset name is "Affaires_technique_groupe"
Agence | DIR | State |
LYON | AURA | Fait |
GRENOBLE | AURA | A faire |
TOULON | PACAC | A faire |
AIX | PACAC | A faire |
df_data1=dataiku.Dataset("Affaires_technique_groupe").get_dataframe()
count = df_data1.count()
Scenario().set_project_variables(Nb_affaire=list(count)[0])
I would like to count the number of "A faire" and to put it in projetc variables but i don t know how to make the filter.
Do you have an idea ?
Thanks for the help
Operating system used: Windows 10
Operating system used: Windows 10
Best Answer
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
You can calculate that ratio as an SQL probe resulting in a dataset metric as well.
You can then use our Python API to retrieve the value, https://doc.dataiku.com/dss/latest/python-api/metrics.html.
I hope this helps.
Answers
-
Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭
Hi,
To do the count, there are at least two ways to do it without code:
- In the dataset metrics, add an SQL probe that has your conditions. This results in a dataset metric that has your count (see attached images);
- Add a Group recipe, with no grouping keys, that has a pre-filter in your condition. This results in a dataset with your count.
Why do you need the value as a variable?
If it is only to display somewhere, you can easily retrieve the values from the two above suggestions, to display in a dashboard for example.
I hope this helps.
-
Hi,
I need it as variable because i want to calculate a ratio (nb "Affaire" / nb total) and to put it in email.
That's why i would like to use python to count.
-
Hi,
In metrics i have this code. I get the number of row but still don t know how to put a filter in it. Is it possible ?
import dataiku
def process(dataset):
df = dataset.get_dataframe()
return {'num_rows' : df.shape[0]}