reading file after partitioning

Registered Posts: 2 ✭✭✭✭

Hello,

I have a filesytem organized this way:

/folder/YEAR/MONTH/DDHH

I tried to partition at the DDHH level, with one folder per partition. Since it is not a 'regular' structure (such as %Y/%M/%DD/.*), I did the partitioning as %Y/%M/%{dimension_2}/.* and it outputs 718 partitions of 1 file (json)

After this operation, I get a problem reading a file from a specific partition :

Error in pull background thread, aborting push

org.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens

I checked the file: when I load it as one dataset (without partition though), I have no problem for reading it.

Any suggestion?

Many thanks in advance!

Answers

  • Dataiker Posts: 355 Dataiker
    It is possible to compose several of the partitioning dimensions in a single path component of the partitioning pattern, like : "/%Y/%M/%D%H/.*" . The important part to notice is that the pattern must end with a "/.*" to catch all files in the folder defined by the rest of the pattern.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.