Technologies like Spark are invaluable in order to reach scalability, both in terms of data and overall computation load. However, they do come with significant entry barriers (even though Dataiku is working hard at reducing them).
In our experience, while there is no fixed threshold, you'll likely need to reach in the multiple millions of records to process before you may need to switch away from simple DSS engine processing.
Even if not the most fashionable, a very simple PostgreSQL database (you can get managed options very easily) will allow you to scale to significant levels of data and processing before you need to go into distributed options like Spark. You will often be able to handle dozens of gigabytes on such a non-distributed infrastructure.
I'd say Java. It's not very fancy, it is fairly verbose and yes it is often boring to write.
However, there are tons of advantages to the language. First, of course, it's really fast, thanks to a virtual machine that is probably the best on earth, with more than 20 years of fine-tuning. But what I like most is this boring and not-fancy aspect.
In Java, writing code is not the most neat and intellectually rewarding task. You don't get the warm-glow feeling of writing something in a neater, more poetic or more balanced way.
However, reading code in Java is extremely easy. There is no magic, no hidden behavior, no wondering "gee, what kind of variable is this", no worrying about "will this operator be overloaded?" or "will this innocuous looking thing actually perform a hidden network call?".
Let's face it, we actually spend more time reading code than writing code. And even if fancy languages have great IDE support, you're not always reading code in nice conditions with your IDE well setup and performing type inference and cross-navigation for you. You may be reading a PR on Github, or frantically pulling up the code on your mobile phone while debugging a production issue in a train.
You want more people than you to be able to read and debug your code (else, you'll do all the debugging yourself 🙂 ). In my opinion, simplicity, absolute lack of surprise and readability trumps the potential productivity gains of more advanced languages.
And my experience is that even if you can have reasonable compromises, more advanced languages tend to "reward" more complicated code that becomes less readable. It's a hard discipline. So while I do enjoy the magic of programming in Scala, I still prefer Java.
Of course, the answer would be very different if I was a data scientist (it would be more about snakes than a language whose name is a single letter). But I am not 🙂