We have 3 hadoop clusters running HDP 2.6.2 with similar workloads.
One of them is massively used through Dataiku by datascientists.
We suffer on this platform of a memory leak on the timeline server process.
The leak seems to be out of the java Heap as the process growth beyond it.
We patched the problem with an hourly restart of the timeline server service but it produces failures for jobs that start during the restart as the resource manger tries to contact the timeline server and fails if a restart is in progress.
Do you have already encountered this kind of problems and do you know how we can resolve it.