Shapley computation trought Spark

NunziaCoppola
NunziaCoppola Registered Posts: 3

Hello everyone,

in Shapley computation using UDF we encouter memory issues or very long processing time.

Tryng to redefine the process with pySpark recipe we use the method "GBTClassificationModel.load" from the library "from pyspark.ml.classification import GBTClassificationModel" in custom code.

Unfortunately the load method takes too long (more than 4h) to upload the model.

Do you know how to speed up the process or work directly with the model without loading it?

I leave attached the code snippet

Tagged:

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,090 Neuron

    Hi, can you please paste the code in a Code Block (look for the </> icon) so it can be copy/pasted? Thanks

  • NunziaCoppola
    NunziaCoppola Registered Posts: 3
    edited July 17
    from pyspark.sql import SparkSession
    from pyspark import SparkContext, SparkConf
    from pyspark.ml.classification import GBTClassificationModel
    import pyspark.sql.functions as F
    from pyspark.sql.types import *
    import pandas as pd
    
    import dataiku
    
    import json
    
    from pyspark.sql import SQLContext
    import gzip
    import dataikuscoring
    
    import os.path
    import os
    import sys
    
    from dataikuscoring import load_model
    
    os.environ['PYSPARK_PYTHON'] = sys.executable
    os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
    
    Project='DKU_TUTORIAL_BASICS_102_1'
    FOLDER_ID = 'EnO4P0lZ'
    
    REPO = dataiku.Folder(FOLDER_ID)
    list_files = REPO .list_paths_in_partition()
    with REPO.get_download_stream(list_files[0]) as stream:
        data = stream.read()
    dataset = dataiku.Dataset("ew_sconfini_scadutiprivati_prepared")
    df = dataset.get_dataframe()
    
    gbt = GBTClassificationModel.load("/DKU_TUTORIAL_BASICS_102_1/EnO4P0lZ/privati-ima_01-weights/model/dss_pipeline_model")
Setup Info
    Tags
      Help me…