Failed to export row error

Bhargavi27
Bhargavi27 Dataiku DSS Core Designer, Registered Posts: 7

I am using 3 excel files. Uisng prepare recipe I have formatted all 3 excels to make it dataIKU compatible. And stacked them together. I am fine until this step, after this step I have created output recipe to send dataset to Tableau. And I got this error

Job failed: Failed to export rows : <class 'ValueError'> : Got an invalid value 'nan' for column "Original_Audit_Date", it must be a Timestamp or a datetime instance

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
    edited July 2024

    Hi @Bhargavi27
    ,

    This indeed appears to be due to the underlying Tableau
    tableauhyperapi package handles dates:
    https://community.tableau.com/s/question/0D54T00000F33cOSAR/inserting-null-and-textbased-datetimes-with-inserter-method-in-python-hyper-api

    To handle this, you can either convert the column from the type timestamp to string. Alternatively, you can run a Python recipe on your dataset to set a dummy value for any empty timestamp columns and leave the type of the column as a timestamp type. For example:

    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    input_dataset = dataiku.Dataset("<INPUT_DATASET>")
    df = input_dataset.get_dataframe()
    
    # Replace 'nan' with NaN (if 'nan' is a string)
    df = df.replace('nan', np.nan)
    
    # Replace NaN and empty/null values with '1970-01-01T00:00:00.000Z'
    date_to_replace = '1970-01-01T00:00:00.000Z'
    output_df = df.fillna(date_to_replace)
    
    output_dataset = dataiku.Dataset("<OUTPUT_DATASET>")
    output_dataset.write_with_schema(output_df)
    


    If you have a way of easily identifying the columns you want to set the na values with, you can add a section to only apply this to certain columns:

    columns ['value1', 'value2'.....]
    for column in columns:
       output_df[column] = df[column].fillna(date_to_replace)
    


    I hope that helps!

    Thanks,
    Sarina

Setup Info
    Tags
      Help me…