count rows with condition

Richard_CDC
Richard_CDC Registered Posts: 6 ✭✭✭

Hi,

I want to count the number of rows with a condition. And after i want to put i as a projetc variable

There are my dataset with my code:

Dataset name is "Affaires_technique_groupe"

AgenceDIRState
LYONAURAFait
GRENOBLEAURAA faire
TOULONPACACA faire
AIXPACACA faire

df_data1=dataiku.Dataset("Affaires_technique_groupe").get_dataframe()
count = df_data1.count()
Scenario().set_project_variables(Nb_affaire=list(count)[0])

I would like to count the number of "A faire" and to put it in projetc variables but i don t know how to make the filter.

Do you have an idea ?

Thanks for the help


Operating system used: Windows 10


Operating system used: Windows 10

Best Answer

Answers

  • Manuel
    Manuel Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 193 ✭✭✭✭✭✭✭

    Hi,

    To do the count, there are at least two ways to do it without code:

    • In the dataset metrics, add an SQL probe that has your conditions. This results in a dataset metric that has your count (see attached images);
    • Add a Group recipe, with no grouping keys, that has a pre-filter in your condition. This results in a dataset with your count.

    Why do you need the value as a variable?

    If it is only to display somewhere, you can easily retrieve the values from the two above suggestions, to display in a dashboard for example.

    I hope this helps.

  • Richard_CDC
    Richard_CDC Registered Posts: 6 ✭✭✭

    Hi,

    I need it as variable because i want to calculate a ratio (nb "Affaire" / nb total) and to put it in email.

    That's why i would like to use python to count.

  • Richard_CDC
    Richard_CDC Registered Posts: 6 ✭✭✭

    Hi,

    In metrics i have this code. I get the number of row but still don t know how to put a filter in it. Is it possible ?

    import dataiku

    def process(dataset):
    df = dataset.get_dataframe()
    return {'num_rows' : df.shape[0]}

Setup Info
    Tags
      Help me…