## Sign up to take part

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forย

Survey banner

Share your feedback on the Dataiku documentation with this 5 min survey. Thanks! TAKE THE SURVEY

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

- Community
- ยป
- Discussions
- ยป
- Using Dataiku
- ยป

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Non idempotent problem with variable expansions in recipe formula language

Hello,

I am trying to use variable expansion to rename my clusters_label column and some date matching operations with formula.

1. A same formula code works with the "group" recipe, but not formula in "prepare" recipe.

2. The third way of accessing variables in formula of recipe are not idempotent.

`${variable_name}, variables.variable_name, variable_name`

have different results.

3. Spark engine outputs empty column while local stream has the correct output.

Do you have any suggestion/solution on this? Thank you very much.

PS: I have to use Spark(due to the data volume) and formula in recipe.

Solutions shown first - Read whole discussion

11 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

1) Can you share the formula, and where/how it's used in the various recipes

2) what are the different results? Note that using ${...} means you replace the value directly in the formula text, so it happens before evaluation (as opposed to the other 2)

3) is it with the formula of 1) ? or an unrelated recipe?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello, Thank you for you quick response.

Example 1:

Hereby the variable

```
{
"cluster_model_1_naming_mapping": [
{"cluster": "cluster_outliers", "new_name": "HCA-BF-HP-BN"},
{"cluster": "cluster_0", "new_name": "BCA-BF-BP-HN"},
{"cluster": "cluster_1", "new_name": "BCA-BF-HP-BN"},
{"cluster": "cluster_2", "new_name": "BCA-BF-BP-BN"},
{"cluster": "cluster_3", "new_name": "BCA-HF-BP-BN"},
{"cluster": "cluster_4", "new_name": "HCA-HF-BP-HN"},
{"cluster": "cluster_5", "new_name": "MCA-BF-BP-BN"}
]
}
```

The formula used is

`filter(variables.cluster_model_1_naming_mapping, item, item["cluster"] == cluster_labels)[0]["new_name"]`

Where cluster_labels is the output column of a KMeans model, whose values are "cluster_1", "cluster_2" .. and so on.

With this formula, the normal engine works but Spark gives nothing.

Example 2:

For formula in group recipe

```
(arrayContains(${precedent_years}, val('date_creation_order_year')))
&& (arrayContains(${trimesters_to_analyse}, trimester))
```

with variables

```
{
"precedent_year": [2018, 2019],
"trimesters_to_analyse": [2, 3]
}
```

This works well in group recipe pre-filter formula but not for the prepare recipe formula.

Simply I want to filter the lines with the correct year in the range and the correct range of trimester.

I am thinking that this may be because of the non-idempotent problem of retrieving values of variables expansion.

Thank you

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

for the Spark issue, indeed in Spark variables are not available via the `variables` object. You need to use `parseJson(${cluster_model_1_naming_mapping})` instead

The second issue is more puzzling. Can you show the step of the Prepare recipe where you use the formula?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

For Spark issue, the editor of formula gives me this error:

`Formula is invalid : Incorrect formula: 'filter(parseJson([{"cluster":"cluster_outliers","new_name":"HCA-BF-HP-BN"},{"cluster":"cluster_0","new_name":"BCA-BF-BP-HN"},{"cluster":"cluster_1","new_name":"BCA-BF-HP-BN"},{"cluster":"cluster_2","new_name":"BCA-BF-BP-BN"},{"cluster":"cluster_3","new_name":"BCA-HF-BP-BN"},{"cluster":"cluster_4","new_name":"HCA-HF-BP-HN"},{"cluster":"cluster_5","new_name":"MCA-BF-BP-BN"}]), item, item["cluster"] == cluster_labels)[0]["new_name"]' : Missing number, string, identifier, regex, or parenthesized expression(Parsing error at offset 18)`

(Sorry I don't have time at this moment for the second one, please allow me to do this in later post.)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, it worked this way.

Finally I believe that the second example is the same problem of the quote.

Have a nice day!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello.

There is a problem with formula again. I used what you suggested as formula and it worked in the "prepare" dataset recipe. This time, I use the same formula in create computed colunms in a "joined recipe" and the parser failed to parse the filter function.

This is the formula

filter(parseJson('${cluster_model_1_naming_mapping}'), item, item["cluster"] == before_cluster_labels)[0]["new_name"]

The error is showed as in the picture

Thank you in advance for your help.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

this is indeed a parse-time error, and the recipe will pretend to be incorrectly setup, but the expression seems actually correct so the recipe should be working fine if you run it

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi I ran the formula but the same error appears.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

considering the operation you're doing (enriching a dataset with a fixed set of values), you should try putting the mapping in an Editable dataset and doing a Join recipe to get the mapped value.

If you absolutely need to use a Grouping recipe, can you check the version of DSS you are using?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you. I am using the DSS 7.0.

It's a good idea with editable dataset. In fact, the formula worked in the prepare recipe, I could use the formula in prepare recipe too. The reason why I try to use this, it's to reduce the the shape of the flow. If I use the Editable dataset, once I need to use the variables in several places, it will ruin the shape of the flow and reduce the maintenanablity.

Thank you very much. I guess that I will have to use another solution.

I am looking forward to your future improvement on this function.

Didn't Find What You Needed?