Synchronization between Google Cloud Storage (Bigquery CSV compatible) and Bigquery failed
otassel
Registered Posts: 9 ✭✭✭✭
Hello,
Synchronization between Google Cloud Storage (Bigquery CSV compatible) and Bigquery failed for this reason : "Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 12178247589. Max allowed size is: 4294967296. "
Here is files on Google Cloud Storage :
How to solve this issue? How to generate files on Google Cloud Storage with no more 4GB per file (so increase the number of generated files) in order to respect Google Cloud Storage CSV file size upload limit?
Tagged:
Answers
-
Hi,
At the moment, you'll need to cheat a bit to force DSS to generate more files:
* Use a "Filter/Sampling" recipe, and set first records sampling with 999999999999 records
* Go in the settings of the output dataset > advanced, and set "write bucketing" to a high enough value (probably at least 30 in your case) -
Thanks Clément. I found another solution who saved me one flow step in comparison with your answer : I increased the number of Max Threads to 30 in the recipe "Advanced" tab. Thereby the recipe generated one file per thread.
Thanks for your help