Use Case 5: Churn Prediction - Error implementing Python code in tutorial

UserBird Dataiker, Alpha Tester Posts: 535 Dataiker

I am new to DSS and am attempting to advance my knowledge through the tutorials. I got to the lecture 'create a scikit-learn (python) model', but when copying and pasting in the code that is to be output to the folder 'model_scikit', I'm receiving this error:

'Job failed : Error in Python process: : min_samples_split must be at least 2 or in (0, 1], got 1'

When I change the value of min_samples_split (eg. [1.0, 3, 10], or even just [2]) I get a different error:

'Job failed : Error in Python process: : Dataset None cannot be used : declare it as input or output of your recipe'

Any ideas?




  • Alex_Reutter
    Alex_Reutter Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer Posts: 105 ✭✭✭✭✭✭✭

    You have the right idea that the problem with the min_samples_split setting is that it's expecting a float value in (0.0,1.0], so it's rejecting the integer 1, so the line of code in Teachable should read:

    "min_samples_split": [1.0, 3, 10],

    The error that you're getting after setting min_samples_split correctly suggests that your recipe input is setting "None" as the input dataset; that is, it looks something like this:

    # Recipe inputs
    df = dataiku.Dataset("None").get_dataframe()

    rather than with "train" as the input dataset.
Setup Info
      Help me…