Assistance with Integrating YOLOv5 in Dataiku

Isaac Emmanuel
Isaac Emmanuel Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 17 ✭✭✭

Hello Dataiku Community,

I'm currently working on a project where I need to use the YOLOv5 algorithm to detect potholes in images. I have a dataset with labeled images ready, and I’m trying to set up YOLOv5 within a Dataiku Python recipe for model training. However, I’ve encountered several issues along the way, and I’m hoping to get some advice from anyone who has worked with YOLO or similar custom packages in Dataiku.

What I’ve Done So Far:

  1. Uploaded the YOLOv5 Package: I initially tried to install YOLOv5 directly from GitHub in my Dataiku code environment, but this led to packaging errors. As a workaround, I manually downloaded the YOLOv5 repository, zipped it, and uploaded it to Dataiku as a custom package.
  2. Code Environment Setup: I added essential dependencies (torch, opencv-python, PyYAML, etc.) in the code environment. However, I ran into problems trying to reference the manually uploaded YOLOv5 package.

How have other users successfully integrated YOLOv5 or similar custom models into Dataiku? Are there best practices for handling dependencies and paths in code environments that I might be missing?

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    It will be best if you post exactly what errors you get so we can try to help. Every ML package is different so there isn't one single approach to solve download/setup issues.

  • Isaac Emmanuel
    Isaac Emmanuel Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 17 ✭✭✭

    The error I get with git+https://github.com/ultralytics/yolov5.git while trying to install the package in the code environment:

    Collecting git+https://github.com/ultralytics/yolov5.git (from -r /opt/dataiku/dataiku_12/dataiku_design/dss_data/tmp/pip-requirements-install/req7585685395633335032.txt (line 1))  
    Cloning https://github.com/ultralytics/yolov5.git to /tmp/pip-req-build-6pjo2_lq 
    Running command git clone --filter=blob:none --quiet https://github.com/ultralytics/yolov5.git /tmp/pip-req-build-6pjo2_lq  
    Resolved https://github.com/ultralytics/yolov5.git to commit 15c40626f5fb1ca58f213e82b7dd429b4e5aa370  Installing build dependencies: started  error: subprocess-exited-with-error    
    × Getting requirements to build wheel did not run successfully.  
    │ exit code: 1  
    ╰─> [14 lines of output]    
      error: Multiple top-level packages discovered in a flat-layout: ['data', 'models', 'segment', 'classify'].         
       To avoid accidental inclusion of unwanted files or directories,    
      setuptools will not proceed with this build.        
        If you are trying to create a single distribution with multiple packages  
        on purpose, you should not rely on automatic discovery.    
      Instead, consider the following options:      
       1. set up custom discovery (`find` directive with `include` or `exclude`)    
      2. use a `src-layout`    
      3. explicitly set `py_modules` or `packages` with a list of names         
       To find more information, look for "package discovery" on setuptools docs.     
     [
    

Setup Info
    Tags
      Help me…