Dataiku on AWS EC2

seungsulee
seungsulee Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Partner

Hi, i created AWS EC2 instance and this type is t3.medium.

I installed Dataiku using Fleet manager and it works well.

Question is after i installed and used dataiku on t3.medium instance, another ec2 instance is created.

It's name is 'design-dataiku_DBC (DSS managed by FM)' and it's type is m5.4xlarge.

Q1. Why 'design-dataiku_DBC (DSS managed by FM)'instance is created???

Q2. Does EC2 type(t3.medium) is related dataiku's running ETL Job and Training model's quality?

If yes, how related?

If no, how can i power up running job and training model's speed?

Answers

  • Sergey
    Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker

    Hi @seungsulee

    If I am reading this right, t3.medium is an FM instance that manages your DSS instances while design-dataiku_DBC (m5.4xlarge) instance is your actual DSS instance that you have created with FM.

    You can choose what EC2 instance type you want to create in the DSS creation wizard:

    Screenshot 2022-05-02 at 11.53.52.png

  • seungsulee
    seungsulee Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Partner

    Hi,

    I understood what you mean.

    Then i have another question.

    My actual DSS Instance type is m5.4xlarge and this is my instance's machine type.(I understood this.)

    So, this machine type is affected running ETL Job and Training Model's speed?

    Do you have any docs to figure out what kind of performance I usually get when I use an instance?
    (What size of data should I use to digest)

  • Sergey
    Sergey Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 365 Dataiker

    Yes, this is where your jobs and model trainings are running.

    What about performance -> this is a truly individual case and highly depends on the actual task. Sometimes it can directly depend on the instance type (how much RAM/CPU it has) but it also can depend on the data type/storage/network, etc.

  • seungsulee
    seungsulee Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Partner

    Hi,

    I searched aws ec2 and sagemaker's cost like image below.

    (first image is ec2, second is sagemaker.)

    In there, EC2:

    m5.4xlarge - 6.944USD per hour.

    (It seems This EC2 = dataiku's instance machine type.)

    but im not sure of relationship between ec2's type and ETL, ML speed. (main question.)

    In there, Sagemaker:

    There are 'ml.m5.4xlarge' and it's vCPU and Memory are same as EC2's m5.4xlarge. (Cost is 1.133 USD)

    (Does sagemaker's instance type is related or not in dataiku?)

    I believe that you have grasped the intent of my question.

    Please give me sufficient answers to my questions and references.

    ec2.png

  • seungsulee
    seungsulee Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 9 Partner

    Then Could i use GPU instance when i create dataiku instance?

    I can't find GPU instance type in the machine type.

    I want to use GPU instance when i train model

Setup Info
    Tags
      Help me…