single and multiple Designer node implementation
1. If we go with multiple designer node with Dataiku
- what are the pros and cons?
- Cross Designer node access of project, data points, recipes, code
2. What are the pros and cons to handle load on huge EC2 with Single designer node?
Answers
-
Hi Riti,
It is not possible to have multiple Design Nodes managing the same pool of projects, in other words you don't scale Dataiku by creating a "cluster" of Design Nodes.
In practice, to perform computation at scale on large amounts of data, Dataiku relies on the concept of computation pushdown, where the heavy lifting is not executed on the Dataiku host but elsewhere, for example:
* directly on the underlying database if you are dealing with SQL tables (e.g. PostgreSQL, MySQL, Snowflake, etc.),
* on a Kubernetes cluster, for code-based workloads (Python, R), Spark jobs or webapp backends.
Since the Dataiku host doesn't store data and doesn't perform large-scale computation, it only acts as an orchestrator and thus doesn't require an excessive amount of resources.
You can find a more illustrative explanation on this video:
Hope this helps!
Best,
Harizo