AttributeError: 'Dataset' object has no attribute '_sc_'

jmccartin
Level 3
AttributeError: 'Dataset' object has no attribute '_sc_'

Hi, this is something for which I would normally make a pull request, but you don't have a public API.  I therefore thought it best if I created a bug report here instead.



Problem:



If one accidentally (or by means of code) passes an object to the `write_with_schema` function in dataiku.spark that isn’t a spark dataframe, the underlying code tries to access the spark context within that assumed dataframe, and crashes with an internal Dataiku error:



 




[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils] - File "/opt/dataiku-dss-5.1.2/python/dataiku/spark/__init__.py", line 139, in write_with_schema

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils] - write_schema_from_dataframe(dataset, dataframe)

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils] - File "/opt/dataiku-dss-5.1.2/python/dataiku/spark/__init__.py", line 122, in write_schema_from_dataframe

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils] - dsc = __dataikuSparkContext(dataframe._sc._jvm)

[2019/04/25-08:07:24.881] [null-out-100] [INFO] [dku.utils] - AttributeError: 'Dataset' object has no attribute '_sc'


 

 




This can happen easily if you have a function that returns a None type which gets passed to the writer instead of a dataframe, resulting in the same kind of AttributeError.



 



Solution:



A single line that asserts that the `dataframe` object is a spark dataframe could be added just before dataiku/spark/__init__.py line 122, where it tries to access the underlying spark context. A TypeError exception would offer a little more help to the user than the current stacktrace.

0 Kudos
1 Reply
cperdigou
Dataiker Alumni
Thank you very much for this report and investigating a solution. I'll pass this information to the development team.
0 Kudos