Using Marimo with Dataiku

FlorentD
FlorentD Dataiker, Dataiku DSS Core Designer, Registered, Moderator Posts: 46 Dataiker

Jupyter Notebook is a widely used tool. It is well known, and many people have their habits in using it. Dataiku provides Jupyter Notebook as a standard for testing code-related stuff, editing Code Recipes, and many other actions. The Jupyter Notebook integration within Dataiku is a cornerstone (for coders) while creating a project or experimenting with new capabilities. As Dataiku can't provide the same level of integration for all tools, Jupyter Notebook is the default tool for this kind of action.

For some users, stepping beyond the traditional Jupyter notebook environment can unlock significant developer velocity and production robustness. Marimo offers a compelling, modern alternative that aligns with best practices in software engineering.

However, some users may want to experiment with other tools that are not present by default. By leveraging Marimo, Dataiku users can achieve a more resilient, maintainable, and engaging coding environment. This change enables them to focus more on the essence of data science, rather than dealing with the complexities of notebook state management.

Marimo particularly offers:

  • Reactive execution: Marimo is a reactive notebook, so when you modify a variable in one cell, all dependent cells are automatically re-executed. So you don't have to worry about running all the downstream cells anymore; every modification is handled by Marimo.
  • UI: Marimo provides native UI elements that react instantly, eliminating the need for complex callbacks and event handlers.
  • Pure Python storage: Marimo notebooks are stored as pure Python Code. This allows you to use Git very easily, focusing on code rather than outputs and execution counts. As it is pure Python, you can also leverage your notebook as a code library, facilitating the transition between testing/developing and delivering.

There may be other reasons to use Marimo, and users may simply want to continue their existing habits.

Fortunately, Dataiku provides a way to interact with Marimo. There are two ways of using Marimo with Dataiku. If you are working outside of Dataiku, just use your default environment and connect the Dataiku instance to your notebook. If you want to use Marimo within Dataiku, this is not a native feature; however, it can be well-integrated into Dataiku thanks to the Code Studio feature.

Using Marimo outside Dataiku

To be able to use Marimo outside of Dataiku, you need to be able to connect to a Dataiku instance. Once you have installed the dataiku packages (see the documentation for more details), have an API key to access the instance, and a Marimo setup. Using Marimo is easy:

  • From your command line, launch Marimo:

marimo edit

  • Then edit your code as usual, for example:

import dataiku

#PROJECT_KEY = "PROGRAMMATICRAGWITHDATAIKUSLLMMESHANDLANGCHAIN"

PROJECT_KEY = "GENAIFULLUSECASE"



client = dataiku.api_client()

project = client.get_project(PROJECT_KEY)

project.list_datasets()

This will connect to your Dataiku instance and list all datasets belonging to a project defined by the PROJECT_KEY variable, as shown in the screenshot below:

image-8dda8e0c85847-757e.png

If you comment/uncomment lines 4 and 5 (#PROJECT_KEY = …), you will see that Marimo will run the cells that are below the modification, as expected. You can see the result of this modification in the next screenshot.

image-a07cea90c2f758-820d.png

Using Marimo within Dataiku

As Marimo is not a native feature of Dataiku, we need to embed it into Dataiku. This is possible thanks to Code Studio. First, create a code environment with the marimo package installed and all other necessary packages for your notebook to run.  Then, create a Code Studio template with at least the following three blocks:

  1. “Add code Environment”: use the previously defined code environment
image-d2e00eea019f2-ccb3.png
  1. “Add an Entry point”: this entry point will be responsible for running Marimo. By default, Marimo launches a new web browser, which is not the desired behavior for integrating Marimo within Dataiku. Therefore, we need to create an endpoint that exposes the port to the user.
image-1a0518dfe2b408-75e6.png
  1. “Terminal”: This is not mandatory, but it will help the user interact with Marimo for launching commands, such as converting a Jupyter Notebook to a Marimo Notebook and vice versa.

You may need a script, like this one:

#!/bin/bash

MARIMO_FOLDER="/home/dataiku/workspace/code_studio-versioned/marimo"

if [ ! -d "${MARIMO_FOLDER}" ]

then

        mkdir -p "${MARIMO_FOLDER}"

fi


while read -rd '' FILE

do

        NAME=$(basename "${FILE}" .ipynb)

        /opt/dataiku/python-code-envs/marimo/bin/marimo convert "${FILE}" -o "${MARIMO_FOLDER}/${NAME}.py"

done  < <(find /home/dataiku/workspace/notebooks -iname "*.ipynb" -print0)


Once the Code Studio template has been created, you can create the runtime and run the Code Studio runtime, as shown in the following screenshot.

image-b4ee1408ed3a88-9ba9.png

You may need to convert your Jupyter Notebook first. Once you have all set it up, you can use Marimo. If you need to import back to the Dataiku instance, you will need to export your Marimo notebook to a Jupyter Notebook and synchronize your files with the Dataiku instance.

Conclusion

In conclusion, leveraging Marimo within Dataiku can significantly enhance the coding experience for users who wish to go beyond the traditional Jupyter Notebook environment. By adopting Marimo, Dataiku coders can benefit from features such as reactive execution, a responsive UI, and pure Python storage, which collectively streamline the development process and enhance code maintainability. 

While using Marimo outside of Dataiku is straightforward, integrating it within the platform requires creating a Code Studio template that includes a code environment, an entry point for running Marimo, and an optional terminal for user interactions. With proper setup, including the conversion of Jupyter Notebooks, users can effectively leverage Marimo’s capabilities to deliver robust data science solutions while minimizing complexity. 

If you want to go further, you can browse this additional documentation:

Setup Info
    Tags
      Help me…