Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

'Impute missing values' does not work

Level 1
'Impute missing values' does not work

I try to do a simple median or mode imputation on a column with integer values with a 'Impute missing values' step within a visual prepare recipe. The script output shows the column with correctly imputed values. However, after I run the script, this transformation step seem to have no effect although the other steps in this prepare recipe were computed correctly. The recipe is computed via the Spark engine, as this transformation step does not work with the normal DSS engine. The job does not show any errors. In the logs, I can also see, that this transformation step is at least part of the job definition:

[2021/05/26-17:09:56.148] [null-err-55] [INFO] [dku.utils]  -     {
[2021/05/26-17:09:56.148] [null-err-55] [INFO] [dku.utils]  -       "type": "FillEmptyWithComputedValue",
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -       "params": {
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -         "mode": "MEDIAN",
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -         "appliesTo": "COLUMNS",
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -         "columns": [
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -           "XXX",
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -           "YYY"
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -         ]
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       },
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       "metaType": "PROCESSOR",
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       "preview": false,
[2021/05/26-17:09:56.152] [null-err-55] [INFO] [dku.utils]  -       "disabled": false,
[2021/05/26-17:09:56.152] [null-err-55] [INFO] [dku.utils]  -       "comment": "Impute XXX and YYY",
[2021/05/26-17:09:56.153] [null-err-55] [INFO] [dku.utils]  -       "alwaysShowComment": true
[2021/05/26-17:09:56.153] [null-err-55] [INFO] [dku.utils]  -     }

I'm working with DSS 8.0.2.

Any idea, why this is not working?

0 Kudos
2 Replies

Thanks for the detailed analysis. We could reproduce your issue and we are investigating it.

In the mean time you can achieve a similar behavior by:
1. Right clicking on the column that you want to impute
2. Choosing "Fill empty rows with..."
3. Choosing the option you want Mode/Mean/constant value

However the mode/mean value will be calculated on the current sample and will be final (not computed on possibly new input data when running the recipe). So the behavior is slightly different from the "Impute missing values"  processor you are using.

Hope it helps,

0 Kudos
Level 1


Exactly the same thing happens to me, the recipe I observe that it is correct then I run and at the exit the changes are not seen 

0 Kudos