'Impute missing values' does not work

arno
arno Partner, Registered Posts: 1 Partner
edited July 16 in Using Dataiku

I try to do a simple median or mode imputation on a column with integer values with a 'Impute missing values' step within a visual prepare recipe. The script output shows the column with correctly imputed values. However, after I run the script, this transformation step seem to have no effect although the other steps in this prepare recipe were computed correctly. The recipe is computed via the Spark engine, as this transformation step does not work with the normal DSS engine. The job does not show any errors. In the logs, I can also see, that this transformation step is at least part of the job definition:

[2021/05/26-17:09:56.148] [null-err-55] [INFO] [dku.utils]  -     {
[2021/05/26-17:09:56.148] [null-err-55] [INFO] [dku.utils]  -       "type": "FillEmptyWithComputedValue",
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -       "params": {
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -         "mode": "MEDIAN",
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -         "appliesTo": "COLUMNS",
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -         "columns": [
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -           "XXX",
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -           "YYY"
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -         ]
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       },
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       "metaType": "PROCESSOR",
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       "preview": false,
[2021/05/26-17:09:56.152] [null-err-55] [INFO] [dku.utils]  -       "disabled": false,
[2021/05/26-17:09:56.152] [null-err-55] [INFO] [dku.utils]  -       "comment": "Impute XXX and YYY",
[2021/05/26-17:09:56.153] [null-err-55] [INFO] [dku.utils]  -       "alwaysShowComment": true
[2021/05/26-17:09:56.153] [null-err-55] [INFO] [dku.utils]  -     }

I'm working with DSS 8.0.2.

Any idea, why this is not working?

Answers

  • arnaudde
    arnaudde Dataiker Posts: 52 Dataiker

    Hello,
    Thanks for the detailed analysis. We could reproduce your issue and we are investigating it.

    In the mean time you can achieve a similar behavior by:
    1. Right clicking on the column that you want to impute
    2. Choosing "Fill empty rows with..."
    3. Choosing the option you want Mode/Mean/constant value

    However the mode/mean value will be calculated on the current sample and will be final (not computed on possibly new input data when running the recipe). So the behavior is slightly different from the "Impute missing values" processor you are using.

    Hope it helps,

  • cris90
    cris90 Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 Partner

    Exactly the same thing happens to me, the recipe I observe that it is correct then I run and at the exit the changes are not seen

Setup Info
    Tags
      Help me…