Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I upload a CSV file into a DSS dataset and i try ton convert il into a Spark DF :
"Mahout_dict" is the DSS Dataset with 2 columns : lb_word (string) and id_word (float) :
mahout_dict = dataiku.Dataset("mahout_dict")
mahout_dict_df = dkuspark.get_dataframe(sqlContext, mahout_dict)
mahout_dict_df.show()
The result is :
+-------+-------+ |lb_word|id_word| +-------+-------+ | 000| null| | 06| null| | 08| null| | 09| null| | 1| null| | 10| null| | 11| null| | 14| null| | 18| null| | 2| null| | 2000| null| | 2001| null| | 2003| null| | 2004| null| | 2005| null| | 2006| null| | 2007| null| | 2008| null| | 2009| null| | 2010| null| +-------+-------+ only showing top 20 rows
Can you help me?
Hi,
it might be that the id_word column has extra whitespace left or right of the float value. Can you share the csv file (or part of it) ?