Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on November 27, 2020 11:56AM
Likes: 0
Replies: 11
Hello,
I am trying to use variable expansion to rename my clusters_label column and some date matching operations with formula.
1. A same formula code works with the "group" recipe, but not formula in "prepare" recipe.
2. The third way of accessing variables in formula of recipe are not idempotent.
${variable_name}, variables.variable_name, variable_name
have different results.
3. Spark engine outputs empty column while local stream has the correct output.
Do you have any suggestion/solution on this? Thank you very much.
PS: I have to use Spark(due to the data volume) and formula in recipe.
Hi,
1) Can you share the formula, and where/how it's used in the various recipes
2) what are the different results? Note that using ${...} means you replace the value directly in the formula text, so it happens before evaluation (as opposed to the other 2)
3) is it with the formula of 1) ? or an unrelated recipe?
Hello, Thank you for you quick response.
Example 1:
Hereby the variable
{ "cluster_model_1_naming_mapping": [ {"cluster": "cluster_outliers", "new_name": "HCA-BF-HP-BN"}, {"cluster": "cluster_0", "new_name": "BCA-BF-BP-HN"}, {"cluster": "cluster_1", "new_name": "BCA-BF-HP-BN"}, {"cluster": "cluster_2", "new_name": "BCA-BF-BP-BN"}, {"cluster": "cluster_3", "new_name": "BCA-HF-BP-BN"}, {"cluster": "cluster_4", "new_name": "HCA-HF-BP-HN"}, {"cluster": "cluster_5", "new_name": "MCA-BF-BP-BN"} ] }
The formula used is
filter(variables.cluster_model_1_naming_mapping, item, item["cluster"] == cluster_labels)[0]["new_name"]
Where cluster_labels is the output column of a KMeans model, whose values are "cluster_1", "cluster_2" .. and so on.
With this formula, the normal engine works but Spark gives nothing.
Example 2:
For formula in group recipe
(arrayContains(${precedent_years}, val('date_creation_order_year'))) && (arrayContains(${trimesters_to_analyse}, trimester))
with variables
{ "precedent_year": [2018, 2019], "trimesters_to_analyse": [2, 3] }
This works well in group recipe pre-filter formula but not for the prepare recipe formula.
Simply I want to filter the lines with the correct year in the range and the correct range of trimester.
I am thinking that this may be because of the non-idempotent problem of retrieving values of variables expansion.
Thank you
for the Spark issue, indeed in Spark variables are not available via the `variables` object. You need to use `parseJson(${cluster_model_1_naming_mapping})` instead
The second issue is more puzzling. Can you show the step of the Prepare recipe where you use the formula?
For Spark issue, the editor of formula gives me this error:
Formula is invalid : Incorrect formula: 'filter(parseJson([{"cluster":"cluster_outliers","new_name":"HCA-BF-HP-BN"},{"cluster":"cluster_0","new_name":"BCA-BF-BP-HN"},{"cluster":"cluster_1","new_name":"BCA-BF-HP-BN"},{"cluster":"cluster_2","new_name":"BCA-BF-BP-BN"},{"cluster":"cluster_3","new_name":"BCA-HF-BP-BN"},{"cluster":"cluster_4","new_name":"HCA-HF-BP-HN"},{"cluster":"cluster_5","new_name":"MCA-BF-BP-BN"}]), item, item["cluster"] == cluster_labels)[0]["new_name"]' : Missing number, string, identifier, regex, or parenthesized expression(Parsing error at offset 18)
(Sorry I don't have time at this moment for the second one, please allow me to do this in later post.)
apologies, I lost the quotes when copying: it should be `parseJson('${cluster_model_1_naming_mapping}')`
Thank you, it worked this way.
Finally I believe that the second example is the same problem of the quote.
Have a nice day!
Hello.
There is a problem with formula again. I used what you suggested as formula and it worked in the "prepare" dataset recipe. This time, I use the same formula in create computed colunms in a "joined recipe" and the parser failed to parse the filter function.
This is the formula
filter(parseJson('${cluster_model_1_naming_mapping}'), item, item["cluster"] == before_cluster_labels)[0]["new_name"]
The error is showed as in the picture
Thank you in advance for your help.
Hi,
this is indeed a parse-time error, and the recipe will pretend to be incorrectly setup, but the expression seems actually correct so the recipe should be working fine if you run it
Hi I ran the formula but the same error appears.
considering the operation you're doing (enriching a dataset with a fixed set of values), you should try putting the mapping in an Editable dataset and doing a Join recipe to get the mapped value.
If you absolutely need to use a Grouping recipe, can you check the version of DSS you are using?
Thank you. I am using the DSS 7.0.
It's a good idea with editable dataset. In fact, the formula worked in the prepare recipe, I could use the formula in prepare recipe too. The reason why I try to use this, it's to reduce the the shape of the flow. If I use the Editable dataset, once I need to use the variables in several places, it will ruin the shape of the flow and reduce the maintenanablity.
Thank you very much. I guess that I will have to use another solution.
I am looking forward to your future improvement on this function.