Schema Change in Dataiku recipe Dataset
Hi Guys,
is there any way to convert the storage data type of a column into the required/expected type, as dataiku detects the schema automatically while creating the recipe dataset, it has becoming tough day by day to work on these kind of changes, kindly please provide your suggestions. before any response do check the below comments/effort made from my end.
- for ex, i have a column Int which is now showing double (interpreted automatically) tried converting to int again (required format) by making changes in the schema tab located in the settings, which has then turned to chaos by updating the whole column with empthy values.
- also checked the column view option from the sample dataset through visual recipe, i am neither seeing an option for INT nor reflecting with expected values after selecting txt.
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Indeed some visual recipes will infer the type.
For example, in a prepared recipe, you can change the type directly by clicking on the column itself and selecting the type. Can you try this option and see if you still see the same behavior?
The output dataset will then have Int, if you want to propagate this change across the flow you can use the schema propagation tool :
https://knowledge.dataiku.com/latest/data-preparation/pipelines/concept-schema-propagation.html
For the code recipe, if you want to maintain the existing schema, you can use infer_with_pandas = False : https://doc.dataiku.com/dss/latest/python-api/datasets-data.html#typing-of-dataframes -
Hi Alex,
Thanks for the response, i tried the above (clicking on the col) but still no luck. its the same what I had mentioned in my first note, all the values are converting to empty values.
please let me know how should i have convert these values directly.
-
I found one solution below.
- should i have to use round to integer option by clicking on the column to convert it to integer in the prepare recipe?, it worked for me.
but i am more looking for a step in recipe script, where i can easily able to convert the type by inputting/mentioning the input format and output format , so when this recipe step runs we get our own results (instead clicking on the explore/sample column tab options )
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Would using the format formula step help in your case?
https://knowledge.dataiku.com/latest/data-preparation/formulas/index.html
You should be able to convert double to int using format. Similar to what you did with round to integer option.
As for the ability to "convert the type by inputting/mentioning the input format and output format," I don't see this capability being available. I would suggest you submit this to https://community.dataiku.com/t5/Product-Ideas/idb-p/Product_IdeasThanks