Dataiku on AWS EC2

seungsulee
Level 2
Dataiku on AWS EC2

Hi, i created AWS EC2 instance and this type is t3.medium.

I installed Dataiku using Fleet manager and it works well.

 

Question is after i installed and used dataiku on t3.medium instance, another ec2 instance is created.

It's name is 'design-dataiku_DBC (DSS managed by FM)' and it's type is m5.4xlarge.

 

Q1. Why 'design-dataiku_DBC (DSS managed by FM)'instance is created???

 

Q2. Does EC2 type(t3.medium) is related  dataiku's running ETL Job and Training model's quality?

If yes, how related?

If no, how can i power up running job and training model's speed?

 

0 Kudos
5 Replies
sergeyd
Dataiker

Hi @seungsulee 

If I am reading this right, t3.medium is an FM instance that manages your DSS instances while design-dataiku_DBC (m5.4xlarge) instance is your actual DSS instance that you have created with FM.

You can choose what EC2 instance type you want to create in the DSS creation wizard: 

Screenshot 2022-05-02 at 11.53.52.png

โ€ƒ

0 Kudos
seungsulee
Level 2
Author

Hi,

 

I understood what you mean.

Then i have another question.

 

My actual DSS Instance type is m5.4xlarge and this is my instance's machine type.(I understood this.)

So, this machine type is affected running ETL Job and Training Model's speed?

 

Do you have any docs to figure out what kind of performance I usually get when I use an instance?
(What size of data should I use to digest)

 

0 Kudos
sergeyd
Dataiker

Yes, this is where your jobs and model trainings are running.

What about performance -> this is a truly individual case and highly depends on the actual task. Sometimes it can directly depend on the instance type (how much RAM/CPU it has) but it also can depend on the data type/storage/network, etc. 

0 Kudos
seungsulee
Level 2
Author

Then Could i use GPU instance when i create dataiku instance?

I can't find GPU instance type in the machine type.

 

I want to use GPU instance when i train model

0 Kudos
seungsulee
Level 2
Author

 

Hi,

I searched aws ec2 and sagemaker's cost like image below.

(first image is ec2, second is sagemaker.)

 

In there, EC2:

m5.4xlarge - 6.944USD per hour.

(It seems This EC2 = dataiku's instance machine type.) 

but im not sure of relationship between ec2's type and ETL, ML speed. (main question.)

 

In there, Sagemaker:

There are 'ml.m5.4xlarge' and it's vCPU and Memory are same as EC2's m5.4xlarge. (Cost is 1.133 USD)

(Does sagemaker's instance type is related or not in dataiku?)

 

I believe that you have grasped the intent of my question.

Please give me sufficient answers to my questions and references.

 

 

ec2.png

 

0 Kudos