error com.dataiku.dip.datasets.fs.FilesystemDatasetTestHandler are in unnamed module of loader 'app'

I get an error when trying to get dataset
https://www2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip
or
ftp://ftp2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip
from New HTTP dataset or or New FTP dataset respectively
An error occurred
class com.dataiku.dip.datasets.fs.HTTPDatasetTestHandler cannot be cast to class com.dataiku.dip.datasets.fs.FilesystemDatasetTestHandler (com.dataiku.dip.datasets.fs.HTTPDatasetTestHandler and com.dataiku.dip.datasets.fs.FilesystemDatasetTestHandler are in unnamed module of loader 'app')
Logs may contain additional information
Additional technical details
- Error type:java.lang.ClassCastException
Answers
-
Hi,
Could you clarify what format this data is supposed to be in? If you try downloading the zip manually, the zip consists of a wide ranging set of files: shx, xml, shp, prj, dbf, and cpg. Depending on how the data is supposed to be read, you could either try uploading the file directly or using python to read it, do some preprocessing, and then create the output dataset once it's a dataframe.
Best,
Andrew
-
Herve Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 58 Partner
File is shapefile type (as per "Working with Shapefiles and US Census Data in DSS" tutorial)
I did indicate "Shapefile" in the format/schema tab
-
Any news on this error? I'm having the same problem
-
I got a reply from Dataiku support after doing the mentioned tutorial.
Unfortunately, shapefiles are not supported on Dataiku Cloud.
For now, the workaround is to convert the shape files to geoJSON. You can use any online converter or custom code.
If you will use custom code exact steps would be the following :
1) Create a code env with geopandas if you don't have one
2) Upload the zip file to the managed folder and uncompress
3) Then create a Python recipe with the following code. Please don't forget to replace folder id and filenameimport dataiku import os import tempfile import geopandas as gpd from dataiku import pandasutils as pdu support_buur_folder_shape_convert = dataiku.Folder("your_input_folder_id") # Define the shapefile name (without extension) input_shapefile_name = "filename" # Create a temporary directory temp_dir = tempfile.mkdtemp() # List files in the Dataiku Folder file_paths = support_buur_folder_shape_convert.list_paths_in_partition() # Copy all the shapefile related files to the temporary directory for file_path in file_paths: file_name = os.path.basename(file_path) temp_file_path = os.path.join(temp_dir, file_name) with open(temp_file_path, "wb") as f: f.write(support_buur_folder_shape_convert.get_download_stream(file_path).read()) shp_file_path = os.path.join(temp_dir, f"{input_shapefile_name}.shp") gdf = gpd.read_file(shp_file_path) output_geojson_path = os.path.join(temp_dir, f"{input_shapefile_name}.geojson") # Save the GeoDataFrame to GeoJSON gdf.to_file(output_geojson_path, driver="GeoJSON") output_folder = dataiku.Folder("your_output_folder_id") output_folder.upload_file(f"{input_shapefile_name}.geojson",output_geojson_path)