Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello Everyone,
I am reaching you to get some advice.
The company where I am working is trying to switch - step-by-step โ a lot of programs to Dataiku. A lot of these programs are running on some outdated tools and / or languages and executed manually (almost every week).
The idea is now to centralize and automatize everything we can.
At this time, we are focusing on the decommissioning of Coheris Liberty (Harry Pilot before).
I donโt who will know it on this community but to explain quickly, it helps to build queries (SQL). You can โpre-codeโ a lot of variables, and create some small tables for correspondences (pretty sure there is an English word for this but canโt find it I am sorryโฆ) as :
Code | Name |
1 | Green |
2 | Blue |
3 | Red |
4 | Yellow |
The problem to switch to Dataiku is with these tables, we have a lot. And some are more complicated as :
Code 1 | Code 2 | Code 3 | Name |
1 | 1 | 1 | France |
1 | 1 | 2 | Germany |
1 | 1 | 3 | USA |
1 | 2 | 1 | Poland |
1 | 2 | 3 | Spain |
2 | 1 | 1 | Luxembourg |
A lot of people (and code) are using these โrulesโ already established and will continue to do it with Dataiku.
The thing is, I am wondering how to transpose them and be able to reach and use them quickly and easily. Itโs important we can continue to update them occasionally.
One of the problem is we are not able to upload an xls dataset from our computers for working on a project so we canโt just get a file which we will update and manage it in one server folder.
I am wondering if creating a "very big" Pyhton dictionary or some sort of โmain tableโ that we will store on a server reachable by Dataiku, coud be good ideas.
So thatโs why I am seeking for help. What would you do in this situation?
Thanks a lot for reading me.
PS : We are working on Dataiku version 9 but we will work on the 11th in 6 months.
Operating system used: Windows
Hi @Sv3n-Sk4,
I'm not sure which option I'd choose if I were you. But I certainly would consider the option of putting an editable dataset in a central project and sharing that to other projects that need it.
Here is a link to the editable dataset documentation: https://doc.dataiku.com/dss/latest/connecting/editable-datasets.html
Marlan
Hi @Sv3n-Sk4,
It's possible to share a dataset between multiple projects. You could create the "main tables" in one project using any format that DSS supports (SQL, S3, etc), and then share them with any projects that need it.
For information on how to share datasets, see Shared objects.
Thanks,
Zach
Hi @ZachM,
Thanks a lot for your answer ๐
I did know I could use a dataset usable in many projects but is it - in your opinon - the best way to do what I want to do ?
Is the solution of a shared libraries with python dictionaries not usable ?
If I create a "main table" do you think it's better to create a table with a lot of columns (Code 1 / Name 1 ; Code 2 / Name 2 ; etc.) ?
The goal is to be able to use condition in the dataset from an other dataset to name individuals depending the code.
As example :
If value of a column = 3 then replace it by the name corresponding of the code 3 from the good main dataset.
Thanks a lot again for your quick answer !!
Hi @Sv3n-Sk4,
For your use case, using a shared library would probably work better than a dataset since the tables would be easier to access that way.
As an alternative, you could use global variables, which can be accessed via Python.
You can set global variables by going to Administration > Settings > Variables:
You can access them in Python from any project like this:
import json
import dataiku
variables = dataiku.get_custom_variables()
code_table = json.loads(variables["code_table"])
# Prints "blue"
print(code_table["1"])
For more information about variables, see Variables.
Thanks,
Zach
Thanks again for you time @ZachM .
I will explore the way of a python's dictionary, however, I am not sure it will be very readable for my colleagues as they are not fluent with python and the dictionary would be a big one, and it won't be easy to update it if needed.
Creating variables will get the same issue as it will be hard to follow for everyone.
I will try to find an usable way and easy one for everyone.
I am starting to learn about API, I am not sure if I can create one where I would store all my prepared code (or my existing tables) and get it when needed (not sure if I can, not sure I know and not sure if my company will allow it).
What I can see know is that it doesn't seem to have a perfect solution for my problem. I will need to find the best and compatible one ๐
Thanks again!
Hi @Sv3n-Sk4 the Product Ideas board is here to let you share and exchange your ideas on how to improve Dataiku so please feel free to utilize it if you think there is an opportunity! Here are some resources to help get you started:
How to suggest Dataiku ideas
Participating on the Product Ideas board
Suggest an idea
I hope this helps!
I hope this helps!
Thanks @CoreyS !
Will have a look ๐
Hi @Sv3n-Sk4,
I'm not sure which option I'd choose if I were you. But I certainly would consider the option of putting an editable dataset in a central project and sharing that to other projects that need it.
Here is a link to the editable dataset documentation: https://doc.dataiku.com/dss/latest/connecting/editable-datasets.html
Marlan
Thanks @Marlan !
I think it's gonna take some time to translate the solution in a big editable dataset but it seems to be the easiest and most understandable way to do it for the whole team.
๐