Logging in dataiku notebook / recipe ...

Ravan
Ravan Registered Posts: 1

Hello Team,

I am working on pyspark recipes. I use notebook to build the logic and change it back into recipe.

The dataiku and spark operations ( e.g. df.count() ) emits a lot of log statements to the console and makes the notebook very difficult to use.

Is there a way for me to supress logging from dataku and spark APIs?

Btw, I ran the snipped "sc.setLogLevel('ERROR')"

Operating system used: Linux

Operating system used: Linux

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker

    Hi @Ravan

    You can use the following code to reduce the verbosity. The first cell containing this code will be verbose, but the rest won't:

    sc = pyspark.SparkContext.getOrCreate()
    sqlContext = SQLContext(sc)
    dkuspark.__dataikuSparkContext(sqlContext._jvm)
    sc.setLogLevel("WARN")
    

    We will be looking into improving this.

    Thanks!

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    I would also argue that you should clean your notebook before it's ready to run as Production code. For instance in a notebook you may run statements df.count()/df.info()/df.show()/df.head()/print()/etc to check the contents of a Pandas data frame, debug code, etc as you are developing your code. These statements take time to execute, generate output that needs to be transmitted and are useless in a non-interactive execution like a recipe running in a scenario. They can also go against PySpark Lazy Evaluation making your code execute slower.

    PS: Don’t use df.count() when you don’t need to return the exact number of rows. To check if data frame is empty, len(df.head(1))>0 or df.head(1).isEmpty or simply df.empty() if you ar ein Pandas as it will be much faster.

Setup Info
    Tags
      Help me…