How to fill empty cells selectively

Options
Tong
Tong Registered Posts: 13 ✭✭✭✭

Hi,

I have a data set like this (see below). I want to fill the cells in column (C%). If the ID is the same, the C% value is the same. I can not use ' fill cells with previous value' because some IDs do not have values. Can I create somethis like this: if the ID is the same, the C% is the same for 1 and 2 (sub ID)? Thanks in advane if anyone can help me with this.

IDsubIDC%
581100
582
591120
592
601
602
611125
612

Best Answer

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Answer ✓
    Options

    @Tong


    A quick thought you might be able to do this with two visual recipes.

    1. Create a dataset from your data that is grouped by ID and contains the MAX or Min of C% for each of your ids.

    now you have two data sets your original dataset, and a new dataset that has your ids and your C%. Hopefully this second dataset will have no missing values for C% for any of the IDs.

    2. Then Use a Join recipe to connect the two datasets on ID. Use a left join with your complete dataset that is missing the values as your first data set, and the second dataset will be this new dataset that you have just created.

    hope that helps.

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    edited July 17
    Options

    Just as a code alternative to @tgb417
    solution, using a python recipe (and provided the name of your dataset is 'mydataset';

    import dataiku
    import pandas as pd

    dataset = dataiku.Dataset("mydataset")
    df =dataset.get_dataframe()

    val_map = df.drop_duplicates(subset='ID').set_index('ID')
    df['C%'] = df.apply(
    lambda x: val_map.loc[x['ID'], 'C%'] if pd.isnull(x['C%'])
    else x['C%'], axis=1)

    ## Add dataiku code to write into a Dataset, e.g.:
    dataset_result = dataiku.Dataset("myresult")
    dataset_resiult.write_with_schema(df)

    With your given example, the C% will remain empty for ID 60, as you don't have any reference value in that case.

    Cheers!

  • Tong
    Tong Registered Posts: 13 ✭✭✭✭
    Options

    Thanks. It works!

  • Tong
    Tong Registered Posts: 13 ✭✭✭✭
    Options

    Thanks for writing me the code. But unfortunately I am not familiar with Python. I just started using DSS since one week. I need to learn in time. But anyway thanks a lot, I might use it for other case in the future.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options
  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
    Options

    Hey @Tong
    welcome to Dataiku and thanks for using DSS. This is only if you haven't started using it already, but since you are brand new to DSS I would also recommend the Dataiku Academy and the Learning Paths

    That way in no time you'll be an expert and even be answering other users questions on here

  • Tong
    Tong Registered Posts: 13 ✭✭✭✭
    Options

    thanks for encouraging me!

Setup Info
    Tags
      Help me…