How to create and store a "Main Table" used in a several projects

Sv3n-Sk4 · ‎10-31-2022

Hello Everyone,

I am reaching you to get some advice.

The company where I am working is trying to switch - step-by-step – a lot of programs to Dataiku. A lot of these programs are running on some outdated tools and / or languages and executed manually (almost every week).

The idea is now to centralize and automatize everything we can.

At this time, we are focusing on the decommissioning of Coheris Liberty (Harry Pilot before).

I don’t who will know it on this community but to explain quickly, it helps to build queries (SQL). You can “pre-code” a lot of variables, and create some small tables for correspondences (pretty sure there is an English word for this but can’t find it I am sorry…) as :

Code	Name
1	Green
2	Blue
3	Red
4	Yellow

The problem to switch to Dataiku is with these tables, we have a lot. And some are more complicated as :

Code 1	Code 2	Code 3	Name
1	1	1	France
1	1	2	Germany
1	1	3	USA
1	2	1	Poland
1	2	3	Spain
2	1	1	Luxembourg

A lot of people (and code) are using these “rules” already established and will continue to do it with Dataiku.

The thing is, I am wondering how to transpose them and be able to reach and use them quickly and easily. It’s important we can continue to update them occasionally.

One of the problem is we are not able to upload an xls dataset from our computers for working on a project so we can’t just get a file which we will update and manage it in one server folder.

I am wondering if creating a "very big" Pyhton dictionary or some sort of “main table” that we will store on a server reachable by Dataiku, coud be good ideas.

So that’s why I am seeking for help. What would you do in this situation?

Thanks a lot for reading me.

PS : We are working on Dataiku version 9 but we will work on the 11th in 6 months.

Operating system used: Windows

Marlan · ‎11-01-2022

Hi @Sv3n-Sk4,

I'm not sure which option I'd choose if I were you. But I certainly would consider the option of putting an editable dataset in a central project and sharing that to other projects that need it.

Here is a link to the editable dataset documentation: https://doc.dataiku.com/dss/latest/connecting/editable-datasets.html

Marlan

View solution in original post

ZachM · ‎10-31-2022

Hi @Sv3n-Sk4,

It's possible to share a dataset between multiple projects. You could create the "main tables" in one project using any format that DSS supports (SQL, S3, etc), and then share them with any projects that need it.

For information on how to share datasets, see Shared objects.

Thanks,

Zach

Sv3n-Sk4 · ‎10-31-2022

Hi @ZachM,

Thanks a lot for your answer 🙂

I did know I could use a dataset usable in many projects but is it - in your opinon - the best way to do what I want to do ?

Is the solution of a shared libraries with python dictionaries not usable ?

If I create a "main table" do you think it's better to create a table with a lot of columns (Code 1 / Name 1 ; Code 2 / Name 2 ; etc.) ?

The goal is to be able to use condition in the dataset from an other dataset to name individuals depending the code.

As example :

If value of a column = 3 then replace it by the name corresponding of the code 3 from the good main dataset.

Thanks a lot again for your quick answer !!

ZachM · ‎10-31-2022

Hi @Sv3n-Sk4,

For your use case, using a shared library would probably work better than a dataset since the tables would be easier to access that way.

As an alternative, you could use global variables, which can be accessed via Python.

You can set global variables by going to Administration > Settings > Variables:

You can access them in Python from any project like this:

import json

import dataiku


variables = dataiku.get_custom_variables()
code_table = json.loads(variables["code_table"])
# Prints "blue"
print(code_table["1"])

For more information about variables, see Variables.

Thanks,

Zach

Sv3n-Sk4 · ‎11-01-2022

Thanks again for you time @ZachM .

I will explore the way of a python's dictionary, however, I am not sure it will be very readable for my colleagues as they are not fluent with python and the dictionary would be a big one, and it won't be easy to update it if needed.

Creating variables will get the same issue as it will be hard to follow for everyone.

I will try to find an usable way and easy one for everyone.

I am starting to learn about API, I am not sure if I can create one where I would store all my prepared code (or my existing tables) and get it when needed (not sure if I can, not sure I know and not sure if my company will allow it).

What I can see know is that it doesn't seem to have a perfect solution for my problem. I will need to find the best and compatible one 😉

Thanks again!

CoreyS · ‎11-01-2022

Hi @Sv3n-Sk4 the Product Ideas board is here to let you share and exchange your ideas on how to improve Dataiku so please feel free to utilize it if you think there is an opportunity! Here are some resources to help get you started:

How to suggest Dataiku ideas
Participating on the Product Ideas board
Suggest an idea

I hope this helps!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!

Sv3n-Sk4 · ‎11-01-2022

Thanks @CoreyS !

Will have a look 🙂

Marlan · ‎11-01-2022

Hi @Sv3n-Sk4,

I'm not sure which option I'd choose if I were you. But I certainly would consider the option of putting an editable dataset in a central project and sharing that to other projects that need it.

Here is a link to the editable dataset documentation: https://doc.dataiku.com/dss/latest/connecting/editable-datasets.html

Marlan

Sv3n-Sk4 · ‎11-03-2022

Thanks @Marlan !

I think it's gonna take some time to translate the solution in a big editable dataset but it seems to be the easiest and most understandable way to do it for the whole team.

🙂

Sign up to take part

How to create and store a "Main Table" used in a several projects

How to create and store a "Main Table" used in a several projects

Setup info