The Dataiku Frontrunner Awards have launched to recognize your achievements! SUBMIT YOUR ENTRY

Add MongoDB support to community edition

Community edition is pretty full-featured for what open source and academic projects may need, but MongoDB is currently excluded. Limiting access to Cassandra and Elasticsearch to paid editions makes sense as those databases are primarily used in paid projects, but MongoDB is common among community projects and ideally would be available as a connector in Dataiku Community Edition. Kaggle datasets typically come in csv and json formats. A typical Dataiku Community user working on a Kaggle project might load CSV files to Postgres and json files (especially those containing nested data) to Mongo to take advantage of its more native functionality in that space. Adding MongoDB support to the free edition of Dataiku would make Dataiku a better tool for open data communities and ultimately grow the Dataiku userbase, leading to more paid licenses down the road as community users recommend Dataiku to their companies.


One other feature I noticed that may be holding community edition back for community projects is the lack of export capability. Since community projects are likely involving individual team members' self-hosted instances of Dataiku with only one user each, the ability to copy projects between Dataiku instances (that is, between different project members on different networks) could make Dataiku a much more useful collaborative tool for community data projects. I understand the idea of limiting multi-node setups to paid subscriptions (perhaps this could be protected instead by only allowing one license to run from a particular IP address / email / local network at a time), but the way that's currently being enforced prevents community users from sharing their projects to others. It may be a lost opportunity to expand Dataiku's userbase, since an easy way to get a new user to try DSS out is simply to receive a project from another user asking for collaboration. This weekend I was working on a personal project and wanted to send it to a friend who's never used Dataiku before to get fresh eyes on it. When I realized I couldn't export my project, I had to do the next best thing- I wrote a Python script to access the Dataiku API to export the settings as json files for my recipes, then sent him the json files to import. Unfortunately, my friend's first introduction to Dataiku will be a somewhat painful project import process (creating a Python recipe and using the API to read in a json file and generate recipes from it) just to gain access to some simple data transformation scripts. I'd love to be able to use Dataiku as a collaborative tool for open data projects. And, if my collaborators also end up liking Dataiku when working on community projects, they're likely to recommend it to their companies for their professional projects.

Edit 2:

The export and import features are already available in community edition after all, I just confused the bundling and export features, which it turns out are different.