String getting truncated using python recipe
Hi all,
Just starting with dataiku:.
I am:
1) reading an image from a managed folder
2) converting that image to base64 (5k character)
3) when i export it into my recipe output dataframe it get truncated to 1k character (string limit)
How do ensure that my string output don't get truncated? tried to change the string max length manually but it seems that it is gettting a reset every time i am running my script.
My script is as below:
import dataiku
import pandas as pd, numpy as np
import base64
from dataiku import pandasutils as pdu
#read Image Folder
images_folder = dataiku.Folder("Pictures")
folder_info=images_folder.get_info()
print(folder_info)
#Read Image
with images_folder.get_download_stream("template.JPG") as f:
data = f.read()
#convert Image to base64
base64_encoded_data = base64.b64encode(data)
print(base64_encoded_data)
#convert to dataframe
base64_df = pd.DataFrame.from_dict({'template':base64_encoded_data}, orient='index') # Compute a Pandas dataframe to write into Base64
# Write recipe outputs
output_base64v2 = dataiku.Dataset("output_base64v2")
output_base64v2.write_with_schema(base64_df)
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi,
When using write_with_schema it will update the schema every time and reset the default string limit to 1000 characters. One way to increase this would be to manually define the schema and set maxLenght to the value you choose and use write_dafaframe instead. Here is sample based on your current code :
import dataiku import pandas as pd, numpy as np import base64 from dataiku import pandasutils as pdu #read Image Folder images_folder = dataiku.Folder("kHGFYqt4") folder_info=images_folder.get_info() print(folder_info) #Read Image with images_folder.get_download_stream("image.png") as f: data = f.read() #convert Image to base64 base64_encoded_data = base64.b64encode(data) print(base64_encoded_data) #convert to dataframe base64_df = pd.DataFrame.from_dict({'template':base64_encoded_data}, orient='index') # Compute a Pandas dataframe to write into Base64 # Write recipe outputs output_base64v2 = dataiku.Dataset("base64") output_base64v2.write_schema([ { "name": "base64_encoded_data", "type": "string", "maxLength": 65000 } ]) output_base64v2.write_dataframe(base64_df)
Answers
-
pipscity Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭
Thanks that really helped.
There was one only issue left: updating column names in the DF to make it work but it's all good now! Final Script is as below.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
import base64
from dataiku import pandasutils as pdu
#read Image Folder
images_folder = dataiku.Folder("Pictures")
folder_info=images_folder.get_info()
print(folder_info)# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
#Read Image
with images_folder.get_download_stream("template.JPG") as f:
data = f.read()#convert Image to base64
base64_encoded_data = base64.b64encode(data).encode('utf-8')
strLength=len(base64_encoded_data)+1# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
#convert out base64 to a dataframe
base64_df = pd.DataFrame.from_dict({'template':base64_encoded_data}, orient='index', columns=['Value'])
print(base64_df)# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Write recipe outputs
output_base64v2 = dataiku.Dataset("output_base64v2")
output_base64v2.write_schema([
{
"name": "Value",
"type": "string",
"maxLength": strLength
}
])# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
output_base64v2.write_dataframe(base64_df)# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
print(output_base64v2.read_schema())