String getting truncated using python recipe

pipscity
pipscity Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭

Hi all,

Just starting with dataiku:.

I am:

1) reading an image from a managed folder

2) converting that image to base64 (5k character)

3) when i export it into my recipe output dataframe it get truncated to 1k character (string limit)

How do ensure that my string output don't get truncated? tried to change the string max length manually but it seems that it is gettting a reset every time i am running my script.

My script is as below:

import dataiku
import pandas as pd, numpy as np
import base64
from dataiku import pandasutils as pdu


#read Image Folder
images_folder = dataiku.Folder("Pictures")
folder_info=images_folder.get_info()
print(folder_info)


#Read Image
with images_folder.get_download_stream("template.JPG") as f:
data = f.read()

#convert Image to base64
base64_encoded_data = base64.b64encode(data)
print(base64_encoded_data)

#convert to dataframe
base64_df = pd.DataFrame.from_dict({'template':base64_encoded_data}, orient='index') # Compute a Pandas dataframe to write into Base64


# Write recipe outputs
output_base64v2 = dataiku.Dataset("output_base64v2")
output_base64v2.write_with_schema(base64_df)

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
    edited July 17 Answer ✓

    Hi,

    When using write_with_schema it will update the schema every time and reset the default string limit to 1000 characters. One way to increase this would be to manually define the schema and set maxLenght to the value you choose and use write_dafaframe instead. Here is sample based on your current code :

    import dataiku
    import pandas as pd, numpy as np
    import base64
    from dataiku import pandasutils as pdu
    
    
    #read Image Folder
    images_folder = dataiku.Folder("kHGFYqt4")
    folder_info=images_folder.get_info()
    print(folder_info)
    
    
    #Read Image
    with images_folder.get_download_stream("image.png") as f:
        data = f.read()
    
    #convert Image to base64
    base64_encoded_data = base64.b64encode(data)
    print(base64_encoded_data)
    
    #convert to dataframe
    base64_df = pd.DataFrame.from_dict({'template':base64_encoded_data}, orient='index') # Compute a Pandas dataframe to write into Base64
    
    
    # Write recipe outputs
    output_base64v2 = dataiku.Dataset("base64")
    output_base64v2.write_schema([
    {
      "name": "base64_encoded_data",
      "type": "string",
       "maxLength": 65000
        
    }
    ])
    
    output_base64v2.write_dataframe(base64_df)

Answers

  • pipscity
    pipscity Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭

    Thanks that really helped.

    There was one only issue left: updating column names in the DF to make it work but it's all good now! Final Script is as below.

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    import base64
    from dataiku import pandasutils as pdu


    #read Image Folder
    images_folder = dataiku.Folder("Pictures")
    folder_info=images_folder.get_info()
    print(folder_info)

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    #Read Image
    with images_folder.get_download_stream("template.JPG") as f:
    data = f.read()

    #convert Image to base64
    base64_encoded_data = base64.b64encode(data).encode('utf-8')
    strLength=len(base64_encoded_data)+1

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    #convert out base64 to a dataframe
    base64_df = pd.DataFrame.from_dict({'template':base64_encoded_data}, orient='index', columns=['Value'])
    print(base64_df)

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # Write recipe outputs
    output_base64v2 = dataiku.Dataset("output_base64v2")
    output_base64v2.write_schema([
    {
    "name": "Value",
    "type": "string",
    "maxLength": strLength
    }
    ])

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    output_base64v2.write_dataframe(base64_df)

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    print(output_base64v2.read_schema())

Setup Info
    Tags
      Help me…