Looking forward to the Intro to Statistics

tgb417 · August 2020

I looking forward to the Intro to Statistics in the ML Practitioner learning path.

I'm finding it a bit odd to be taking the Interactive Visual Statistics course without that prior course.

I'm finding myself constantly asking the question, how will I use a Student t-test or Kolmogrov-Smirnov test (two-sample) for example in building machine learning models?

Three questions to Dataiku Staff.

Will the Intro to Statistics class have good descriptions of the use cases for these specific Statistical Tests in building valid models?
This class does not appear to be available in the legacy course catalog. Am I missing something? Is that correct?
Is there any further update on when it will become available?

--Tom

Alex_Reutter · August 2020

Hi Tom!

The Statistics are typically more exploratory prior to model building (the ML Basics comes first in the path b/c it's the core of the course), rather than used within the model building step. There is a plan to include use cases for the tests.

That's correct that this isn't in the legacy catalog! This and the intro to ML are entirely new material. Previously our tutorials had stuck closely to the model of "this is how to do a particular task using DSS, see elsewhere for more general knowledge on the topic"; as we build out Academy for more users with varying levels of statistics / data science background, we're creating more general courses.

Right now, I still can't say much more than "soon" for timing, but we will post to the community when new courses are available.

Best,
Alex

tgb417 · August 2020

@Alex_Reutter
,

Or for that matter anyone here in the community.

Do you have any suggestions about courses (maybe outside the Dataiku academy) that provide practical knowledge about using more standard statistics as part of the model building process? In particular, building intuitive understanding around the value, and use cases for the type of univariate, bivariate, and tests that have been introduced in the new statistics worksheet feature.

I think I get the value of a Correlation Matrix in finding potentially silver bullets, or information leaking into the model we are building. And if some target fits a "standard" distribution this may provide a better way to model a particular target. Some of the ML models make specific assumptions about the distribution of features and target variables.

I guess I'm looking for answers to the why are these specific statistical methods important. How should I be integrating them into my practices?

--Tom

Looking forward to the Intro to Statistics

Answers

Categories

Setup Info

Tags