Use Custom UDFs on Visual Recipes

Saul
Saul Registered Posts: 23 ✭✭✭✭

Hello Dataikers!

Since all visual recipes are based on SparkSQL, some "advance" aggregations aren't available. In this case, I have 3 values on 3 columns: A, B, C. And I just want to compute Median from them.

The problem is that Median function doesn't exist on my current Spark backend version, so I need to use a UDF to do it. But since this is a visual recipe, I cannot inject my "advance" UDF.

Do you know if there's any place where I can define UDFs and Dataiku can read them and then bind them in visual recipes? Something similar to Global Code.

Thank you in advance,

-Saul

Happy coding!

Tagged:

Answers

  • ChrisWalter
    ChrisWalter Registered Posts: 10 ✭✭✭✭

    Hey Saul!

    I get your struggle. You can define custom UDFs in Dataiku DSS under the "Code Libraries" section, and then you should be able to use them in your visual recipes. Happy coding indeed!

  • Saul
    Saul Registered Posts: 23 ✭✭✭✭
    edited July 17

    Hello Chris,

    Thanks for your response, do you think you please provide an example of it?

    My goal is compute median over and array in a visual recipe. I already have this code in Code Libs:

    from pyspark.sql.functions import udf
    from pyspark.sql.types import FloatType
    import numpy as np
    
    arrayMedianUDF = udf(lambda array: np.median(array), FloatType())


    Now how can I call arrayMedianUDF on a visual recipe?


    recipe.png

    Thank you in advance,

    Saul



Setup Info
    Tags
      Help me…