I'm also facing the same issue. any suggestions?
E.g. Copy a string column that only contains numbers, that the copied column is also of type string and not casted to integer or float?
For the time being, I've converted the field to string and concatenated required number of values (omitting .0 ) in the python recipe and loaded to output dataset.
I'm also facing this issue in a python recipe. Although the column is typed as string (Object) and I defined the output schema to be string: Text, after running the recipe this column gets converted into numbers.
The business issue here is that this column contains zero-padded values which we need to keep.
So how can one define and keep the output format as defined?
Welcome @Wido !
I have the same business issue with those zero-padded values, but have not encountered problems with keeping them intact. Different is that i don't (yet) use python recipes when handling those, so hopefully someone else can help you with that. For a visual-oriented solution there is an example below.
Example : copying a zero-padded value to a new column . I use a visual prepare step, the formula-option and simply state strval("column_name") , the value is copies 'as is' to the new column including preceeding zero's.. When using val("column_name"), numval("column_name") or just column_name the padding gets stripped.
As my coworkers have an Excel-mania i do add a dummy-prefixletter to those values before exporting data out of dataiku, to ensure that excel sees those values as a stringvalue and won't start stripping them.
I’ve not tried this. However, I was thinking about your export to excel question below. Rather than using a dummy letter have you thought of pretending a single quote. ‘ to the 0 prefixed number. Something like:
MS Excel sees the single quote as a prefix that means that the following is text and should not be seen as a number. I don’t know if this will work from within Dataiku. But it might be worth a try.
Your suggestion is great, surely preferable over what i do with those prefixes now @tgb417 Tom, i didn't know excel could be forced in this way. Sadly i can't use it : this prefixletter is a fixed and longstanding procedural thing to ensure everybody involved has a crystal clear picture of what a certain value represents. A dirty solution for a "between display and chair"-challenge so to speak.
Thanks for your thoughts and suggestions. I have found a solution, that works for me.
The template code for python recipe is not very helping here. The input and output schemas get overwritten using that code. What you need to do:
input_df = dataset.get_dataframe(infer_with_pandas=False)
By default pandas infers the schema and overwrites what you have defined as input.
For writing the dataframe the template suggests to use
However the following will use the schema as defined:
To be honest the documentation is very poor at this point. A lot if talk around the topic but no clear API specification. But it solves my issue now. Hopefully it can help others as well.