Hi,
Your dataset probably has multi-line records, which cannot be processed in Spark.
Spark and Hadoop work by cutting input data files in segments and processing them in parallel. For CSV files, they cut at an arbitrary point in the file and look for an end-of-line and start processing from here.
Thus, it is not really possible to process multi-line records in Spark (or Hadoop), since it might cut at the wrong place. We strongly recommend that you start by syncing your CSV dataset to a Parquet or ORC one (using the local DSS engine instead of Hadoop or Spark). As soon as you are on a "non-textual" format, you won't have issues anymore.
Alternatively, this could also be caused by invalid quoting style: see http://answers.dataiku.com/561/unterminated-quoted-field-at-the-end-of-the-file