Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
it's as the message says, DSS won't let you read parquet data directly in a HTTP dataset. You need to download the parquet files to a HDFS-friendly location first:
- create a managed folder (+ New dataset > Folder) on a HDFS or S3 or Azure or GCS connection
- use a Download recipe to download the parquet files to that folder
- make a new HDFS or S3 or Azure or GCS dataset (same type as the folder) pointing to the location of the folder
Note that=
1) you'll need your DSS to have a working Hadoop integration in place, because Parquet is a file format with strong ties to Hadoop. So if you have a Hadoop distribution installed on the machine, use the `bin/dssadmin install-hadoop-integration` command from the DSS data dir; if you don't have a Hadoop distribution, grab the hadoop standalone jar package from our download site and install with `bin/dssadmin install-hadoop-integration -standaloneArchive /path/to/where/you/downloaded/it` (DSS restart needed afterwards)
2) if using S3/Azure/GCS, you'll need to set the "HDFS interface" field on the DSS connection to something else than "None". If using Azure, it must be ABFS