Synchronization between Google Cloud Storage (Bigquery CSV compatible) and Bigquery failed

Options
otassel
otassel Registered Posts: 9 ✭✭✭✭

Hello,

Synchronization between Google Cloud Storage (Bigquery CSV compatible) and Bigquery failed for this reason : "Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 12178247589. Max allowed size is: 4294967296. "

Here is files on Google Cloud Storage :

How to solve this issue? How to generate files on Google Cloud Storage with no more 4GB per file (so increase the number of generated files) in order to respect Google Cloud Storage CSV file size upload limit?

Tagged:

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Options
    Hi,

    At the moment, you'll need to cheat a bit to force DSS to generate more files:

    * Use a "Filter/Sampling" recipe, and set first records sampling with 999999999999 records
    * Go in the settings of the output dataset > advanced, and set "write bucketing" to a high enough value (probably at least 30 in your case)
  • otassel
    otassel Registered Posts: 9 ✭✭✭✭
    Options
    Thanks Clément. I found another solution who saved me one flow step in comparison with your answer : I increased the number of Max Threads to 30 in the recipe "Advanced" tab. Thereby the recipe generated one file per thread.

    Thanks for your help
Setup Info
    Tags
      Help me…