Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi everyone,
Iยดve a challengue with a jupyther notebook using pyspark. The trouble is when I try to instance a dataframe with the instruction write_with_schema. The complete sentence are:
import dataiku
from dataiku import spark as dkuspark
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
# Read recipe inputs
VW_DL_AUTSINIESTROS = dataiku.Dataset("VW_DL_AUTSINIESTROS")
VW_DL_AUTSINIESTROS_df = dkuspark.get_dataframe(sqlContext, VW_DL_AUTSINIESTROS)
P3D = dataiku.Dataset("P3D")
P3D_df = dkuspark.get_dataframe(sqlContext, P3D)
------ cell #2
# Cambiar a string
P3D_df = P3D_df.withColumn('NUMCOMPLETOCOTIZACION', P3D_df.NUMCOMPLETOCOTIZACION.cast('string'))
VW_DL_AUTSINIESTROS_df = VW_DL_AUTSINIESTROS_df.withColumn('NUMCOMPLETOCOTIZACION', VW_DL_AUTSINIESTROS_df.NUMCOMPLETOCOTIZACION.cast('string'))
------- cell #3
tabla = P3D_df.join(VW_DL_AUTSINIESTROS_df, on=['NUMCOMPLETOCOTIZACION'], how='left')
display(tabla)
-------- cell #4
cols = ['FECHA_EMISION', 'NUMCOMPLETOCOTIZACION', 'OCURRIDO_NETO']
-------- cell #5
tabla = tabla.select(*cols)
-------- cell #6
# Compute recipe outputs from inputs
# TODO: Replace this part by your actual code that computes the output, as a SparkSQL dataframe
siniestros_elyvv_df = tabla # For this sample code, simply copy input to output
-------- cell #7
# Write recipe outputs
siniestros_elyvv = dataiku.Dataset("siniestros_elyvv")
display(siniestros_elyvv)
-------- cell #8
dkuspark.write_with_schema(siniestros_elyvv,siniestros_elyvv_df)
#here is where the instruction is on a loop.
I really appreciate if help me with any ideas. Thanks in advance.
HP
Operating system used: CentOS 7
Hi @Cancun_Mx ,
Troubleshooting spark code from a notebook will be very difficult.
I would suggest you try the same code in PySpark recipe instead and then review the job diagnostics or open a support ticket with the job diagnostics.
Thanks,