ModuleNotFoundError: No module named 'sklearn'

Options
ASten1
ASten1 Partner, Registered Posts: 19 Partner

Hi everybody,

to make my project I've created a python 3.6 environment. I've installed the required packages and among them there is scikit-learn. Despite that I'm facing a problem that to me is strange. In my recipes I can import the library and its submodules. But when I create an Api endpoint in Api Designer and I try to run test queries it gives me an error, the one that gives the title to this discussion. In my Api settings I have set my environment, so I don't understand why it doesn't work.

In the following I'll describe a little more in detail the "history" of this environment. In the first place I created the environment with the default scikit-learn version, the 0.23, the newest. But with that version in the Api Designer I had the error written above. It showed that error trying to execute the command:

from sklearn.ensemble.partial_dependence import partial_dependence

Searching online it seemed a version problem, so I created another environment with scikit-learn 0.22 and that error disappeared. Indeed I was able to run my test queries and then push the Api to the Api Deployer, with my service perfectly deployed and working.

Lately I've tried to re-run those test queries in the Api Designer and it appeared the same error, but trying to execute this command:

from sklearn.model_selection import KFold

I've then tried to import it in my recipe, with the same env, and it doesn't give any problem. I don't understand where is the problem, a help would be really appreciated.

P.S.: I've tried to create an empty Api endpoint with a new environment that beside the needed default packages has scikit-learn installed and the same error appears, so I suppose it doesn't depend on my work

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓
    Options

    (sorry, forgot about the second question along the way)

    yes, DSS code around ML assumes sklearn in a given version range. Given the size of the ML codebase, what exactly requires this specific versions and not a later one is hard to say. You can always try to use newer versions of sklearn and check is it fails or not for the particular models you are building, but that's of course not supported by Dataiku.

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    can you do a "Update" of the code env with "rebuild env" checked, then verify that the scikit-learn installed is of the right version in the "Installed packages" tab? (ie scikit-learn>=0.20,<0.21, like what you see when you use "add sets of packages" in the code env)

  • ASten1
    ASten1 Partner, Registered Posts: 19 Partner
    Options

    Hi,

    I've done what you asked and now in my installed packages scikit-learn is not present anymore, as other packages that I've installed. Do you have any idea why that happened?

    Thank you!

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    did you install them manually from a Python notebook or via the command line? If you have them in the "packages to install" tab, only a failure to build the code env should prevent them from actually being installed

  • ASten1
    ASten1 Partner, Registered Posts: 19 Partner
    Options

    Sorry, I hadn't clear what the rebuild does. I didn't pass it the packages, so of course rebuilding resulted in a default environment without additional packages. I added the set of packages recommended, which includes scikit-learn 0.20.4, plus one of the mandatory packages to run my code, which is joblib.

    They result correctly installed, I can see them in the installed packages, but trying to run the queries now it says me that Dev server is running, and that is good because before it wasn't, but still it gives error, saying:

    ModuleNotFoundError: No module named 'sklearn.ensemble._forest'

    At this point I've another question, to work on dataiku with sklearn a version >=0.20, <0.21 is needed?

  • ASten1
    ASten1 Partner, Registered Posts: 19 Partner
    Options

    Looking deeper in the logs, actually I have another error resulting from handling with the above exception:

    TypeError: Object of type 'FrameSummary' is not JSON serializable

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi,

    sklearn.ensemble._forest is indeed an addition of sklearn v0.22 (was called sklearn.ensemble.forest before). This means that:

    - either your code explicitely calls or imports it

    - the model you are trying to use was built in a code environment with sklearn >= 0.22 and you're now trying to read it in a code env with sklearn < 0.22 , which is not possible because how pickle works. You'll need to retrain the model in a code env with sklearn 0.20.4

  • ASten1
    ASten1 Partner, Registered Posts: 19 Partner
    Options

    You were right, now it works, really thank you! So as a conclusion, I assume that to work with dataiku is necessary to have scikit-learn of the right version?

    Regards

  • piyushmittal
    piyushmittal Registered Posts: 1 ✭✭✭
    Options

    Hi, I am facing a similar issue on my on-premise conda environment linked with DATAIKU. I am using scikit-learn=0.24.2 which I suppose the latest library but the error still persists. Could you please suggest some solution?

Setup Info
    Tags
      Help me…