error com.dataiku.dip.datasets.fs.FilesystemDatasetTestHandler are in unnamed module of loader 'app'

Herve
Herve Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 58 Partner

I get an error when trying to get dataset

https://www2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip

or

ftp://ftp2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip

from New HTTP dataset or or New FTP dataset respectively

An error occurred

class com.dataiku.dip.datasets.fs.HTTPDatasetTestHandler cannot be cast to class com.dataiku.dip.datasets.fs.FilesystemDatasetTestHandler (com.dataiku.dip.datasets.fs.HTTPDatasetTestHandler and com.dataiku.dip.datasets.fs.FilesystemDatasetTestHandler are in unnamed module of loader 'app')

Logs may contain additional information

Additional technical details

  • Error type:java.lang.ClassCastException
Tagged:

Answers

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭

    Hi,

    Could you clarify what format this data is supposed to be in? If you try downloading the zip manually, the zip consists of a wide ranging set of files: shx, xml, shp, prj, dbf, and cpg. Depending on how the data is supposed to be read, you could either try uploading the file directly or using python to read it, do some preprocessing, and then create the output dataset once it's a dataframe.

    Best,

    Andrew

  • Herve
    Herve Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer, Registered Posts: 58 Partner

    File is shapefile type (as per "Working with Shapefiles and US Census Data in DSS" tutorial)

    I did indicate "Shapefile" in the format/schema tab

  • spielman641
    spielman641 Registered Posts: 1 ✭✭

    Any news on this error? I'm having the same problem

  • NathanMooreVenn
    NathanMooreVenn Partner, Registered Posts: 1 Partner
    edited August 22

    I got a reply from Dataiku support after doing the mentioned tutorial.

    Unfortunately, shapefiles are not supported on Dataiku Cloud.
    For now, the workaround is to convert the shape files to geoJSON. You can use any online converter or custom code.
    If you will use custom code exact steps would be the following :
    1) Create a code env with geopandas if you don't have one
    2) Upload the zip file to the managed folder and uncompress
    3) Then create a Python recipe with the following code. Please don't forget to replace folder id and filename

    import dataiku
    import os
    import tempfile
    import geopandas as gpd
    from dataiku import pandasutils as pdu
    
    support_buur_folder_shape_convert = dataiku.Folder("your_input_folder_id")
    
    # Define the shapefile name (without extension)
    input_shapefile_name = "filename"
    
    # Create a temporary directory
    temp_dir = tempfile.mkdtemp()
    
    # List files in the Dataiku Folder
    file_paths = support_buur_folder_shape_convert.list_paths_in_partition()
    
    # Copy all the shapefile related files to the temporary directory
    for file_path in file_paths:
      file_name = os.path.basename(file_path)
      temp_file_path = os.path.join(temp_dir, file_name)
      with open(temp_file_path, "wb") as f:
        f.write(support_buur_folder_shape_convert.get_download_stream(file_path).read())
        shp_file_path = os.path.join(temp_dir, f"{input_shapefile_name}.shp")
        gdf = gpd.read_file(shp_file_path)
        output_geojson_path = os.path.join(temp_dir, f"{input_shapefile_name}.geojson")
      
    # Save the GeoDataFrame to GeoJSON
    gdf.to_file(output_geojson_path, driver="GeoJSON")
    
    output_folder = dataiku.Folder("your_output_folder_id")
    
    output_folder.upload_file(f"{input_shapefile_name}.geojson",output_geojson_path)
    
    
Setup Info
    Tags
      Help me…