I have a Python project which contains various folders and packages and modules. How can I move that to Dataiku? Is there any functionality available for that? Is there any way to create complete Python project in Dataiku?
Could you please help me with the steps?
You can reuse your Python code thanks to project libraries in DSS, that allow you to directly import any function/class declared there in your DSS code-based assets (recipes, notebooks, scenarios, etc.). If your Python project lives in a remote Github repository, you can even take advantage of Git references to directly pull it on your DSS instance.
There are several ways to reuse and further integrate your Python code into DSS, you can use this section of the reference documentation as a starting point to get more information depending on your use-case.
Thanks for this information. Let's say I have imported my project in Library editor. Now I want to know how ill I run my project and also how can I install packages like pandas, snowflake.
- To run your project, you will need to create a Flow and build its components. The Flow is a directed acyclic graph (DAG) which acts as the main item to orchestrate the tasks to run in DSS. To learn more about it, you can watch this introductory video and read the corresponding section of the reference documentation.
- To leverage additional Python packages, you will need need to create code environments and list which packages (and corresponding versions) you wish to use. After that, you will be able to specify which code environment to use in any of your code recipe / notebook.
Hope this helps.
Thanks Harizo! It worked for me.
I want to know if I want to install any package (in python we do pip install and then we import in our code).
For e.g. pip install package-name and then in python file we do package-name.py
How we can achieve this in library editor in Dataiku?
To install any package you want, you should use Dataiku's code environments feature and not directly install them via pip. Under the hood, Dataiku uses pip and virtual environments to create and manage code environments so that end users won't have to manually deal with it themselves.
Once your code environment is up and ready, you will be able to import whatever function/class from it in your project library code, and have your code recipe / notebook running smoothly. Don't forget to point your recipe/notebook to the appropriate code environment in its settings.
Welcome to the Dataiku Community!
I personally use a mix of Python and Visual recipes in Dataiku Data Science Studio (DSS). I work with folks who are full-time Python data coders, typically working in Jupyter Notebooks, sometimes VS Code. We all have found good ways to work in DSS as the core of our collaboration hub. I suspect you might find a useful home in DSS.
This Coder course in the Dataiku Academy might also eventually be of interest to you.