Hope you are all doing good during Pandemic !
I have got a problem where I have to convert the date column with in the format " YYYY-MM-DDT00:00:00.00Z" to "YYYY-MM-DD" removing the timestamp using a formula as I cannot use a Date parser or a split column processors as the engine is SQL and has a limitation to use these processors (typically split should work but it is showing an error).
Also, can you please help to replace the column values which begins with "00+" with "XY" and the values which are blank with the values of another column
A B C (actual data)
"Should replace the 00-0 with XY as it begins with "00+" and the blank values in column B should be replaced with the values of Column C
(should be converted to)
A B C.
XY "elb" "elb"
Hi @Tsurapaneni. I've two questions first:
a) what SQL database are you using? I did a test with an Oracle db, and the date parser and the extract date part are working. What error are you getting?
b) Do you really need to use the SQL engine? If the amount of data is not that much, and easily fits in memory, maybe it could be a nice idea to use the local engine to avoid the limitation you mentioned. (But if that is not the case, and database is big, my suggestion is not valid)
About the second part of your question:
if(isBlank(B), C, B)if instead of blank values you have null (empty) values, you should use isNull(B) instead of isBlank(B)
I hope this helps!
Thank you so much for the reply. I am using In-database sql engine as we are dealing with large datasets. Regarding the date parser, it is showing an error saying that "not supported settings for "date parser" processor. I am not able to remove the time in the date which is in the format "YYYY-MM-DDT00:00:00.00Z" to convert into "YYYY-MM-dd". I did test out with the functions trunc, asDate etc.. but in vain, not showing the desired results.
In the second part of the question, sorry to use the word replace but it is begins with, if a value begins with "00+" I should replace the entire value with "XY". Find and replace processor is also in the list of limitations.
No problem @Tsurapaneni.
I understand now why you need to use the SQL engine.
I hope this help a little further!
I have tried the format processor for the date but, I am getting the below error and thus the job failed 😞 sadly. Regarding the find and replace processor that still didn't server my purpose I have to replace the string which begins with "000+" which means convert the number which "BEGINS" with 000- with elb "000+0000000" into "elb", but not just replacing the "000+" the find and replace processor with regular expression helps but when regular expression is selected it is showing that it is not supported in the in-database sql and it is not helping me in this case.
For date I am getting the below error. and for the string with replace processor, it said the findreplace processor is not supported (as I mentioned before). Do you have any helpful inputs for these problems please.
Hi @Ignacio_Toledo ,
The isblank formula when ran was giving the error " ERROR: COALESCE types integer and character varying cannot be matched". Do you have any idea on why this is coming up and what factors to check for these.
@Tsurapaneni, to do the kind of replacement you want you should replace "substring" with regexp.
About the problems with using the SQL engine, I'm not sure I can help you, specially if I still don't know what database are you using (Oracle, mysql, postgresql, etc.?) Maybe the database you are using is not fully supported by the dataiku SQL capabilities, or maybe you have a dataset where (as it might be hinted on the Error message) the timestamps don't have a timezone (and there is a way to deal with that when creating the dataset).
The database is postgre sql @Ignacio_Toledo . Can we write a formula which helps to identify the values which begins with "000+...." so that we can replace them like if startswith "000+" then "elb" else the same column name. would there be any possibility for it this way ?
@Tsurapaneni, I'm confused with the "replacement algorithm" you want to implement. From the post description I understood that
Have this changed? Could you please try to clarify the algorithm? Now you are saying something different apparently. Anyway, the "Find and replace" processor, if your SQL is postgresql, will only work with exact match or substring, but not with regexp.
About the problem with the Date processor, I've just tried with a postgresql and again this works for me.
Could you check that in the Dataset connected to postgresql you have activated the following?:
If you have done that, and still doesn't work, I would either contact your DSS and/or database administrator, or Dataiku support.
Hi @Ignacio_Toledo ,
The substring one didn't change, it's the same but the replace processor replaces the substring only but not the entire string (this is what I have seen after testing) and I want to replace the entire string which starts with "000+". regarding the date and time, I am still working on it.
Hi @Tsurapaneni. Ok, I understand the issue now with the "find and replace" processor. In that case this should work (provided the name of the column is A):
Notice that you need to add explicitly the SQL code needed to make the find and replace, and the full code there is:
CASE WHEN position('000+' IN "A") > 0 THEN 'XY' ELSE "A" END
Last thing about the converting the timestamp to yyyy-MM-dd, there is one extra correction to make it work:
You need to extract the date into a different column, or you can get an error.