This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forÂ

- Community
- Â»
- Discussions
- Â»
- Using Dataiku DSS
- Â»

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Solved!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Conditional Average (average if)

1 Solution

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

It treats them as blank, meaning that they won't be taken into account in your average computation.

Solutions shown first - Read whole discussion chronologically

10 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

You don't say if you want to calculate this average over a bunch of rows, or across columns in a row. So, I can't be very specific here.

Second, you didn't say anything about your backend server.

If your backend is based on a SQL besed server. You might be able to do something with a custom agrigation using a SQL case statement to count the non-zero values and a SQK Sum, and do the division for an average.

If you are working with "tall" or tidy style data the window recipe might give you an option to do what you want. It already knows how to do averages across groups of rows.

Hope that helps a bit. If not and you are willing to share a bit more about your data and use case. We might be able to help you a little bit better.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have multiple columns that I want to compute averages for, grouped on timestep. Its economic scenario data so its like

Trial | Timestep | Asset A | Asset B | Asset C

I want to create columns that take the average at each timestep across all trials for Asset A where A is greater than 0, then again for B where B >0 and again for C where C > 0. I can take an average in the group recipe easily but its not "conditional". I have the option of using spark or DSS. I tried 'averageif' in spark but that did not work

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks for sharing that you are using spark.

I don't have access to a Spark set up at the moment.

However, I'm wondering if you have set your visual recipe to use the SPARK Engine or if the recipe is running in the DSS engine. See below for how you can check.

If you have an averageif SPARK function. Then this might work.

As I've said I don't have access to a SPARK server. However, based on a quick Google Search I don't know if SPARK server actually provides an averageif function. In doing a quick search I'm only finding references to MS Excel when it comes to the function averageif.

Let us know how you get on with this. I'd like to invite anyone else here in the community with access to SPARK to pick up this thread and provide some additional help.

--Tom

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi,

If you want to apply a specific filter on your input data before performing the aggregation, you can use the "pre-filter" option on the left and select the "filter" category.

Best,

Harizo

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

but would this reject rows where conditions are not met in other columns?

In the below example I want to compute the average across all Trials at a given timestep. I want to reject the negative values BUT since the second row has a negative entry for B would it through away that row for all the other columns?

Trial | Time | A | B |

1 | 0 | .02 | -.02 |

1 | 1 | .02 | .01 |

1 | 2 | -.01 | .03 |

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Yes, the pre-filter operation removes the entire rows.

If you want to retain the rest of the values that are valid in a given row, you should consider running an upstream Prepare recipe on your Dataset and clear the cells that match your filtering criterion.

Best,

Harizo

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

how could I do that? does it clear them (treat as blank) or make them 0 (Which would lower average)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

It treats them as blank, meaning that they won't be taken into account in your average computation.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I found the function to do this. Thank you!