Looking forward to the Intro to Statistics

tgb417
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

I looking forward to the Intro to Statistics in the ML Practitioner learning path.

I'm finding it a bit odd to be taking the Interactive Visual Statistics course without that prior course.

I'm finding myself constantly asking the question, how will I use a Student t-test or Kolmogrov-Smirnov test (two-sample) for example in building machine learning models?

Three questions to Dataiku Staff.

  • Will the Intro to Statistics class have good descriptions of the use cases for these specific Statistical Tests in building valid models?
  • This class does not appear to be available in the legacy course catalog. Am I missing something? Is that correct?
  • Is there any further update on when it will become available?

--Tom

Answers

  • Alex_Reutter
    Alex_Reutter Alpha Tester, Dataiker Alumni, Dataiku DSS Core Designer Posts: 105 ✭✭✭✭✭✭✭

    Hi Tom!

    The Statistics are typically more exploratory prior to model building (the ML Basics comes first in the path b/c it's the core of the course), rather than used within the model building step. There is a plan to include use cases for the tests.

    That's correct that this isn't in the legacy catalog! This and the intro to ML are entirely new material. Previously our tutorials had stuck closely to the model of "this is how to do a particular task using DSS, see elsewhere for more general knowledge on the topic"; as we build out Academy for more users with varying levels of statistics / data science background, we're creating more general courses.

    Right now, I still can't say much more than "soon" for timing, but we will post to the community when new courses are available.

    Best,
    Alex

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @Alex_Reutter
    ,

    Or for that matter anyone here in the community.

    Do you have any suggestions about courses (maybe outside the Dataiku academy) that provide practical knowledge about using more standard statistics as part of the model building process? In particular, building intuitive understanding around the value, and use cases for the type of univariate, bivariate, and tests that have been introduced in the new statistics worksheet feature.

    I think I get the value of a Correlation Matrix in finding potentially silver bullets, or information leaking into the model we are building. And if some target fits a "standard" distribution this may provide a better way to model a particular target. Some of the ML models make specific assumptions about the distribution of features and target variables.

    I guess I'm looking for answers to the why are these specific statistical methods important. How should I be integrating them into my practices?

    --Tom

Setup Info
    Tags
      Help me…