Design Node Launch Failure

ARATA
ARATA Partner, Registered Posts: 10 Partner

I have set up a Dataiku on AWS and configured it to automatically start/stop the Fleet Manager and design nodes using AWS functionality.

Yesterday, both the EC2 instances and DSS started successfully.

However, today, while the EC2 instances started successfully, the DSS failed to start.

For reference, the version of Dataiku being used is 14.0.0.

Issue

First, upon checking the Fleet Manager, the design node’s status was marked as InErr Looking into the Events logs, the following message was recorded:

pi-initialization-failed - 2025-07-08 09:40:17  
{  
  "physicalInstanceId": "pi-xxx",  
  "ec2InstanceId": "i-xxx",  
  "error": {  
    "errorType": "<class 'dataikufmagent.exception.FmAgentException'>",  
    "message": "Location /data is mounted on unexpected device /dev/nvme0n1 instead of requested device /dev/nvme1n1",  
    "detailedMessage": "<class 'dataikufmagent.exception.FmAgentException'>: Location /data is mounted on unexpected device /dev/nvme0n1 instead of requested device /dev/nvme1n1",  
    "stackTrace": []  
  }  
}

Next, I attempted to stop the design node and re-provision it.

The re-provisioning failed, and the following message was logged in the Events:

pi-initialization-failed - 2025-07-08 10:02:42  
{  
  "physicalInstanceId": "pi-xxx",  
  "ec2InstanceId": "i-xxx",  
  "error": {  
    "errorType": "<class 'dataikufmagent.exception.FmAgentException'>",  
    "message": "Device has unexpected filesystem",  
    "detailedMessage": "<class 'dataikufmagent.exception.FmAgentException'>: Device has unexpected filesystem",  
    "stackTrace": []  
  }  
}

After repeating the stop and re-provisioning actions several times, the DSS finally started successfully.

Questions

  1. Was my response and action in handling the error messages described above appropriate?
  2. What could be the possible root causes of this issue?

Additional Information

For reference, the design node in question has previously had its data volume replaced on AWS (detaching and attaching a different data volume).

Thank you in advance for your assistance.

Operating system used: almalinux (9.6)

Tagged:

Answers

  • Han_Han
    Han_Han Dataiker, Registered Posts: 28 Dataiker

    Hi @ARATA

    Thanks for reaching out!

    I think the best to get support for your situation is by directly open a support ticket. This way the technical support team can look into it fast.

    Han

  • yonghyun
    yonghyun Registered Posts: 31 ✭✭✭

    Additional Information

    For reference, the design node in question has previously had its data volume replaced on AWS (detaching and attaching a different data volume).

    위에 말씀하신것 처럼 해당 작업을 하신 것이 문제가 된것으로 보여집니다.

    EC2 인스턴스 초기화 도중, /data 경로가 예상한 디바이스 (/dev/nvme1n1)가 아니라 다른 디바이스 (/dev/nvme0n1)에 마운트되어 있어서 실패했다는 것을 나타냅니다.

  • yonghyun
    yonghyun Registered Posts: 31 ✭✭✭

    fleetmanger로 실행한노드를 aws에서 직접 조작하면 fleetmanger db와 불일치가일어납니다

Setup Info
    Tags
      Help me…