Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Column name can not contain leading or trailing white spaces with DSS engine, but can't find which

mikempc
Level 2
Column name can not contain leading or trailing white spaces with DSS engine, but can't find which

I am doing a join on several columns. I just changed the name of one and now I have an error regarding columns name, which "can not contain comma, quotation mark, leading or trailing white spaces with DSS engine (when using H2)."

 

        "code" varchar,
	"question_name" varchar,
	"question_type" varchar,
	"scorable_question" double,
	"subquestion_name" varchar,
	"stage" varchar,
	"product_ID" varchar,
	"respondent_id" varchar,
	"answer_name" varchar,
	"answer" bigint,
	"Test Code" varchar,
	"Country" varchar,
	"Program" varchar,
	"Preferred Formats LBrands" varchar
) AS SELECT "code","question_name","question_type","scorable_question","subquestion_name","stage","product_ID","respondent_id","answer_name","answer","Test Code","Country","Program","Preferred Formats LBrands" FROM CSVREAD('/data/dataiku/datadir/jobs/SURVEYSMERGING/Build_raw_survey_data_with_program_joined__NP__2021-11-28T19-30-33.066/compute_raw_survey_data_with_program_joined_NP/dataset-to-h2/Gtdaflh8qkkHYWZBcS5Y/SURVEYSMERGING.compute_raw_survey_data_with_program_flash_format_preference.csv', 'code,question_name,question_type,scorable_question,subquestion_name,stage,product_ID,respondent_id,answer_name,answer,Test Code,Country,Program,Preferred Formats LBrands', 'charset=UTF-8 escape=\\ fieldSeparator=, fieldDelimiter="')
[2021/11/28-19:31:00.214] [FRT-40-FlowRunnable] [INFO] [dku.h2loader] - Done: 6ms
...
[2021/11/28-19:31:03.311] [FRT-40-FlowRunnable] [INFO] [org.apache.hadoop.fs.s3a.S3AInputStream] - Switching to Random IO seek policy [2021/11/28-19:31:03.329] [FRT-40-FlowRunnable] [INFO] [com.dataiku.dip.input.formats.parquet.RowTupleConverter] - Input Parquet MessageType: message hive_schema { optional binary code (UTF8); optional binary question_name (UTF8); optional binary question_type (UTF8); optional double scorable_question; optional binary stage (UTF8); optional binary product_ID (UTF8); optional binary respondent_id (UTF8); optional binary Test Code (UTF8); optional binary Country (UTF8); optional binary Program (UTF8); optional int64 More for nightime; optional int64 Equally good for day or night; optional int64 More for daytime; optional int64 More for weekends; optional int64 Equally good for weekdays or weekends; optional int64 More for weekdays; optional int64 Good for everyday or special occasions; optional int64 More for an everyday fragrance; optional int64 More for special occasions; } [2021/11/28-19:31:03.330] [FRT-40-FlowRunnable] [INFO] [com.dataiku.dip.input.formats.parquet.RowTupleConverter] - Detected Parquet flavor: HIVE ...
[2021/11/28-19:31:03.423] [FRT-40-FlowRunnable] [INFO] [dku.h2loader] - CREATE TABLE "SURVEYSMERGING.raw_survey_data_with_program_flash_occasion" ( "code" varchar, "question_name" varchar, "question_type" varchar, "scorable_question" double, "stage" varchar, "product_ID" varchar, "respondent_id" varchar, "Test Code" varchar, "Country" varchar, "Program" varchar, "More for nightime" bigint, "Equally good for day or night" bigint, "More for daytime" bigint, "More for weekends" bigint, "Equally good for weekdays or weekends" bigint, "More for weekdays" bigint, "Good for everyday or special occasions" bigint, "More for an everyday fragrance" bigint, "More for special occasions" bigint ) AS SELECT "code","question_name","question_type","scorable_question","stage","product_ID","respondent_id","Test Code","Country","Program","More for nightime","Equally good for day or night","More for daytime","More for weekends","Equally good for weekdays or weekends","More for weekdays","Good for everyday or special occasions","More for an everyday fragrance","More for special occasions" FROM CSVREAD('/data/dataiku/datadir/jobs/SURVEYSMERGING/Build_raw_survey_data_with_program_joined__NP__2021-11-28T19-30-33.066/compute_raw_survey_data_with_program_joined_NP/dataset-to-h2/Gtdaflh8qkkHYWZBcS5Y/SURVEYSMERGING.raw_survey_data_with_program_flash_occasion.csv', 'code,question_name,question_type,scorable_question,stage,product_ID,respondent_id,Test Code,Country,Program,More for nightime,Equally good for day or night,More for daytime,More for weekends,Equally good for weekdays or weekends,More for weekdays,Good for everyday or special occasions,More for an everyday fragrance,More for special occasions', 'charset=UTF-8 escape=\\ fieldSeparator=, fieldDelimiter="') [2021/11/28-19:31:03.491] [FRT-40-FlowRunnable] [INFO] [dku.h2loader] - Done: 68ms ...
[2021/11/28-19:31:06.008] [FRT-40-FlowRunnable] [INFO] [com.dataiku.dip.input.formats.parquet.RowTupleConverter] - Input Parquet MessageType: message hive_schema { optional binary code (UTF8); optional binary question_name (UTF8); optional binary question_type (UTF8); optional double scorable_question; optional binary stage (UTF8); optional binary product_ID (UTF8); optional binary respondent_id (UTF8); optional binary Test Code (UTF8); optional binary Country (UTF8); optional binary Program (UTF8); optional int64 Youthful; optional int64 Dark; optional int64 Light / Airy; optional int64 Woody; optional int64 Soft; optional int64 Juicy; optional int64 Sweet; optional int64 Natural; optional int64 Strong; optional int64 Clean; optional int64 Warm; optional int64 Dry; optional int64 Rich; optional int64 Fresh; optional int64 Fruity; optional int64 Artificial / Chemical; optional int64 Spicy; optional int64 Sharp / Harsh; optional int64 Overwhelming,overpowering; optional int64 Cheap; optional int64 Dewy / Wet; optional int64 Old-fashioned; optional int64 Citrusy; optional int64 Heavy / Cloying; optional int64 Floral / Flowery; optional int64 Masculine; optional int64 Easytowear; optional int64 Hascharacter; optional int64 Suitsmypersonality; optional int64 Sensual / Sexy; optional int64 Comfortable; optional int64 Elegant / Refined; optional int64 Bold; optional int64 Makesmefeelgood; optional int64 Expensive; optional int64 Happy / Joyful; optional int64 Fun / Playful; optional int64 Confident; optional int64 Calm / Relaxing; optional int64 Smellsfamiliar; optional int64 Hopeful / Optimistic; optional int64 Seductive; optional int64 Delicious; optional int64 Makesastatement; optional int64 Feminine; optional int64 Hasclarity; optional int64 Simple; optional int64 Complex; optional int64 Standsout; optional int64 Unexpected; optional int64 Open; optional int64 Unique; optional int64 Exciting; optional int64 Reassuring; optional int64 Authentic; optional int64 Chic; optional int64 Memorable; optional int64 Makesmefeelpowerful; optional int64 Pure; optional int64 Classic; optional int64 New / Original; optional int64 Addictive; optional int64 Beautiful; optional int64 Casual; optional int64 Sporty; optional int64 Overwhelming; optional int64 Romantic; optional int64 Sparkling / Vibrant; optional int64 Creamy; optional int64 Colorful / Bright; optional int64 Powdery; optional int64 Goodquality; optional int64 Radiant; optional int64 Aggressive / Harsh; } [2021/11/28-19:31:06.008] [FRT-40-FlowRunnable] [INFO] [com.dataiku.dip.input.formats.parquet.RowTupleConverter] - Detected Parquet flavor: HIVE ...
[2021/11/28-19:31:08.377] [FRT-40-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_raw_survey_data_with_program_joined_NP java.lang.IllegalArgumentException: in act.compute_raw_survey_data_with_program_joined_NP: Column name can not contain comma, quotation mark, leading or trailing white spaces with DSS engine (when using H2). at com.dataiku.dip.utils.ErrorContext.iae(ErrorContext.java:129) at com.dataiku.dip.dataflow.exec.h2.DatasetToH2Loader.failOnBadH2Identifier(DatasetToH2Loader.java:186) at com.dataiku.dip.dataflow.exec.h2.DatasetToH2Loader.load(DatasetToH2Loader.java:216) at com.dataiku.dip.dataflow.exec.h2.H2RecipeRunner.run(H2RecipeRunner.java:155) at com.dataiku.dip.dataflow.exec.MultiEngineRecipeRunner.run(MultiEngineRecipeRunner.java:203) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2021/11/28-19:31:08.403] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_raw_survey_data_with_program_joined_NP - activity is finished [2021/11/28-19:31:08.403] [ActivityExecutor-30] [ERROR] [dku.flow.activity] running compute_raw_survey_data_with_program_joined_NP - Activity failed java.lang.IllegalArgumentException: in act.compute_raw_survey_data_with_program_joined_NP: Column name can not contain comma, quotation mark, leading or trailing white spaces with DSS engine (when using H2). at com.dataiku.dip.utils.ErrorContext.iae(ErrorContext.java:129) at com.dataiku.dip.dataflow.exec.h2.DatasetToH2Loader.failOnBadH2Identifier(DatasetToH2Loader.java:186) at com.dataiku.dip.dataflow.exec.h2.DatasetToH2Loader.load(DatasetToH2Loader.java:216) at com.dataiku.dip.dataflow.exec.h2.H2RecipeRunner.run(H2RecipeRunner.java:155) at com.dataiku.dip.dataflow.exec.MultiEngineRecipeRunner.run(MultiEngineRecipeRunner.java:203) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2021/11/28-19:31:08.404] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_raw_survey_data_with_program_joined_NP - Executing default post-activity lifecycle hook [2021/11/28-19:31:08.407] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_raw_survey_data_with_program_joined_NP - Removing samples for SURVEYSMERGING.raw_survey_data_with_program_joined [2021/11/28-19:31:08.408] [ActivityExecutor-30] [INFO] [dku.flow.activity] running compute_raw_survey_data_with_program_joined_NP - Done post-activity tasks

 

I don't find any column with such a problem in the logs ...


Operating system used: Windows

0 Kudos
1 Reply
sergeyd
Dataiker
Dataiker

Hi @mikempc 

Can you please list the column names from the datasets you join? The issue here is that DSS complains about a weird character in one of the input datasets columns.

0 Kudos