Non idempotent problem with variable expansions in recipe formula language
Hello,
I am trying to use variable expansion to rename my clusters_label column and some date matching operations with formula.
1. A same formula code works with the "group" recipe, but not formula in "prepare" recipe.
2. The third way of accessing variables in formula of recipe are not idempotent.
${variable_name}, variables.variable_name, variable_name
have different results.
3. Spark engine outputs empty column while local stream has the correct output.
Do you have any suggestion/solution on this? Thank you very much.
PS: I have to use Spark(due to the data volume) and formula in recipe.
Answers
-
Hi,
1) Can you share the formula, and where/how it's used in the various recipes
2) what are the different results? Note that using ${...} means you replace the value directly in the formula text, so it happens before evaluation (as opposed to the other 2)
3) is it with the formula of 1) ? or an unrelated recipe?
-
Hello, Thank you for you quick response.
Example 1:
Hereby the variable
{ "cluster_model_1_naming_mapping": [ {"cluster": "cluster_outliers", "new_name": "HCA-BF-HP-BN"}, {"cluster": "cluster_0", "new_name": "BCA-BF-BP-HN"}, {"cluster": "cluster_1", "new_name": "BCA-BF-HP-BN"}, {"cluster": "cluster_2", "new_name": "BCA-BF-BP-BN"}, {"cluster": "cluster_3", "new_name": "BCA-HF-BP-BN"}, {"cluster": "cluster_4", "new_name": "HCA-HF-BP-HN"}, {"cluster": "cluster_5", "new_name": "MCA-BF-BP-BN"} ] }
The formula used is
filter(variables.cluster_model_1_naming_mapping, item, item["cluster"] == cluster_labels)[0]["new_name"]
Where cluster_labels is the output column of a KMeans model, whose values are "cluster_1", "cluster_2" .. and so on.
With this formula, the normal engine works but Spark gives nothing.
Example 2:
For formula in group recipe
(arrayContains(${precedent_years}, val('date_creation_order_year'))) && (arrayContains(${trimesters_to_analyse}, trimester))
with variables
{ "precedent_year": [2018, 2019], "trimesters_to_analyse": [2, 3] }
This works well in group recipe pre-filter formula but not for the prepare recipe formula.
Simply I want to filter the lines with the correct year in the range and the correct range of trimester.
I am thinking that this may be because of the non-idempotent problem of retrieving values of variables expansion.
Thank you
-
for the Spark issue, indeed in Spark variables are not available via the `variables` object. You need to use `parseJson(${cluster_model_1_naming_mapping})` instead
The second issue is more puzzling. Can you show the step of the Prepare recipe where you use the formula?
-
For Spark issue, the editor of formula gives me this error:
Formula is invalid : Incorrect formula: 'filter(parseJson([{"cluster":"cluster_outliers","new_name":"HCA-BF-HP-BN"},{"cluster":"cluster_0","new_name":"BCA-BF-BP-HN"},{"cluster":"cluster_1","new_name":"BCA-BF-HP-BN"},{"cluster":"cluster_2","new_name":"BCA-BF-BP-BN"},{"cluster":"cluster_3","new_name":"BCA-HF-BP-BN"},{"cluster":"cluster_4","new_name":"HCA-HF-BP-HN"},{"cluster":"cluster_5","new_name":"MCA-BF-BP-BN"}]), item, item["cluster"] == cluster_labels)[0]["new_name"]' : Missing number, string, identifier, regex, or parenthesized expression(Parsing error at offset 18)
(Sorry I don't have time at this moment for the second one, please allow me to do this in later post.)
-
apologies, I lost the quotes when copying: it should be `parseJson('${cluster_model_1_naming_mapping}')`
-
Thank you, it worked this way.
Finally I believe that the second example is the same problem of the quote.
Have a nice day!
-
Hello.
There is a problem with formula again. I used what you suggested as formula and it worked in the "prepare" dataset recipe. This time, I use the same formula in create computed colunms in a "joined recipe" and the parser failed to parse the filter function.
This is the formula
filter(parseJson('${cluster_model_1_naming_mapping}'), item, item["cluster"] == before_cluster_labels)[0]["new_name"]
The error is showed as in the picture
Thank you in advance for your help.
-
Hi,
this is indeed a parse-time error, and the recipe will pretend to be incorrectly setup, but the expression seems actually correct so the recipe should be working fine if you run it
-
Hi I ran the formula but the same error appears.
-
considering the operation you're doing (enriching a dataset with a fixed set of values), you should try putting the mapping in an Editable dataset and doing a Join recipe to get the mapped value.
If you absolutely need to use a Grouping recipe, can you check the version of DSS you are using?
-
Thank you. I am using the DSS 7.0.
It's a good idea with editable dataset. In fact, the formula worked in the prepare recipe, I could use the formula in prepare recipe too. The reason why I try to use this, it's to reduce the the shape of the flow. If I use the Editable dataset, once I need to use the variables in several places, it will ruin the shape of the flow and reduce the maintenanablity.
Thank you very much. I guess that I will have to use another solution.
I am looking forward to your future improvement on this function.