Hybrid Python/R code environment
I am facing a problem wherein I need to create a custom R model in DSS because I can't find a corresponding algorithm in Python. While this can be done with code recipes, it is a bit inconvenient to track all the experiment necessary during model development. Thus I was thinking to use a fictitious python model, calling R in the background through packages like rpy2.
For cases like this, I think it would be great to have the possibility to build hybrid Python/R code environments and it should be not too difficult to modifying the building process of the Docker containers accordingly.
Maybe it could be possible to think at giving the possibility, as advanced option, to specify code environments as Docker files or images inheriting from dku-spark-base-teunrfbekcqtcymgo9jmomdg:dss-9.0.4.
Comments
-
For reference: the earlier discussion with background on this idea is Hybrid code environment. It includes more context on where this request is coming from.
-
A cool approach might be a streamlined system to directly call other recipes from code. R functions could be wrapped in a recipe, then called from Python using the Dataiku API. If Dataiku's overhead were very low, it could be similar to serverless and microservice architecture, where Dataiku can become the glue logic to easily blend processes from all sorts of languages or frameworks.
In this case, instead of the recipe taking a dataset as input or a dataset as output, it would just behave as a function, being callable with input parameters or an observable data stream and returning a value or processed data stream to its caller. These could be exposed generically regardless of underlying implementation with a wrapping Dataiku API.