Python Recipe execution - backend improvements
Hi Dataiku admins, wondering if this post/question could find its way to the backend engineers.
I often have to dig through the logs of failed Python jobs, and I've noticed that the dataiku logging could be improved - due to the presence of duplicate timestamps.
The reason for the duplication is probably because, in addition to the timestamps set by the log4j settings, a second timestamp comes from the fact that the python logging module is initialised with a 'basicConfig' (line 4 in the 'python-exec-wrapper'), with asctime in the formatting. It makes the lines quite long and redundant. Take a look at the log generated by your code below (and notice the typo, too):
[2020/03/05-16:51:53.039] [null-err-101] [INFO] [dku.utils] - 2020-03-05 16:51:53,037 INFO Running a DSS Python recipe locally, uinsetting env
I should probably also mention that the 'python-exec-wrapper' launching each recipe is a little alarming to look at. There's a lot of best-practise violation going on in there, from the argument handling - (there's standard libraries to do that stuff for you), the imports of modules all over the place, the exception handling, etc. It doesn't look like it's had a code review in a while, nor does it look testable at all in its current state.
Just wanted to put this out there without being too snarky, if there was a repo somewhere I'd gladly submit a pull request...
Answers
-
Hi,
Thanks for your feedback, we'll be having a look at that.