DataFrame developed with PySpark remains running without yielding any results.
Hi everyone,
I´ve a challengue with a jupyther notebook using pyspark. The trouble is when I try to instance a dataframe with the instruction write_with_schema. The complete sentence are:
import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
# Read recipe inputs
VW_DL_AUTSINIESTROS = dataiku.Dataset("VW_DL_AUTSINIESTROS")
VW_DL_AUTSINIESTROS_df = dkuspark.get_dataframe(sqlContext, VW_DL_AUTSINIESTROS)
P3D = dataiku.Dataset("P3D")
P3D_df = dkuspark.get_dataframe(sqlContext, P3D)
------ cell #2
# Cambiar a string
P3D_df = P3D_df.withColumn('NUMCOMPLETOCOTIZACION', P3D_df.NUMCOMPLETOCOTIZACION.cast('string'))
VW_DL_AUTSINIESTROS_df = VW_DL_AUTSINIESTROS_df.withColumn('NUMCOMPLETOCOTIZACION', VW_DL_AUTSINIESTROS_df.NUMCOMPLETOCOTIZACION.cast('string'))
------- cell #3
tabla = P3D_df.join(VW_DL_AUTSINIESTROS_df, on=['NUMCOMPLETOCOTIZACION'], how='left')
display(tabla)
-------- cell #4
cols = ['FECHA_EMISION', 'NUMCOMPLETOCOTIZACION', 'OCURRIDO_NETO']
-------- cell #5
tabla = tabla.select(*cols)
-------- cell #6
# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a SparkSQL dataframe
siniestros_elyvv_df = tabla # For this sample code, simply copy input to output
-------- cell #7
# Write recipe outputs
siniestros_elyvv = dataiku.Dataset("siniestros_elyvv")
display(siniestros_elyvv)
-------- cell #8
dkuspark.write_with_schema(siniestros_elyvv,siniestros_elyvv_df)
#here is where the instruction is on a loop.
I really appreciate if help me with any ideas. Thanks in advance.
HP
Operating system used: CentOS 7
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @Cancun_Mx
,
Troubleshooting spark code from a notebook will be very difficult.
I would suggest you try the same code in PySpark recipe instead and then review the job diagnostics or open a support ticket with the job diagnostics.Thanks,
Answers
-
Thanks a lot, mate!