We have some DSS users, who get the error message: "java.lang.OutOfMemoryError: GC overhead limit exceeded" when trying to join 2 files with about 80k records and about 5k & 100 columns. We traced this problem and think this might be solved by increasing the jek.xmx setting to 3 GB. We are not completely sure what will happen if we change this. Will all JEK sessions be sized at 3 GB (and thus lowering the maximal number of JEK sessions), or will this set the maximum size to 3 GB.
And a second question: We do not have direct access to the setup of DSS, but all changes need be done by (Ansible) scripts. How can we proceed with this.?
Thanks in advance..
as per the documentation page you linked, please try to increase the jek.xmx value to 3g. As stated in the documentation, this will set the size of JEK processes to 3GB, so all the JEKs will now be sized 3GB.
If you still experience issues, you might consider offloading the job to an external computation engine, like a database or a cluster (hadoop or kubernetes).
As per your second question, this edit can be done via Ansible using the blockinfile module.
Note: This example is provided as is, without any support.
- name: Stop DSS shell: /path/to/dss/bin/dss stop >> /some/remote/log.txt
- name: Change xmx setting for JEKs
- blockinfile: | dest=/path/to/dss/install.ini backup=yes content="[javaopts] jek.xmx = 3g"
- name: Regenerate DSS configs
shell: /path/to/dss/bin/dssadmin regenerate-config >> /some/remote/log.txt
- name: Restart DSS
shell: /path/to/dss/bin/dss start >> /some/remote/log.txt
Architect @ Dataiku