Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I looking forward to the Intro to Statistics in the ML Practitioner learning path.
I'm finding it a bit odd to be taking the Interactive Visual Statistics course without that prior course.
I'm finding myself constantly asking the question, how will I use a Student t-test or Kolmogrov-Smirnov test (two-sample) for example in building machine learning models?
Three questions to Dataiku Staff.
--Tom
Hi Tom!
The Statistics are typically more exploratory prior to model building (the ML Basics comes first in the path b/c it's the core of the course), rather than used within the model building step. There is a plan to include use cases for the tests.
That's correct that this isn't in the legacy catalog! This and the intro to ML are entirely new material. Previously our tutorials had stuck closely to the model of "this is how to do a particular task using DSS, see elsewhere for more general knowledge on the topic"; as we build out Academy for more users with varying levels of statistics / data science background, we're creating more general courses.
Right now, I still can't say much more than "soon" for timing, but we will post to the community when new courses are available.
Best,
Alex
Or for that matter anyone here in the community.
Do you have any suggestions about courses (maybe outside the Dataiku academy) that provide practical knowledge about using more standard statistics as part of the model building process? In particular, building intuitive understanding around the value, and use cases for the type of univariate, bivariate, and tests that have been introduced in the new statistics worksheet feature.
I think I get the value of a Correlation Matrix in finding potentially silver bullets, or information leaking into the model we are building. And if some target fits a "standard" distribution this may provide a better way to model a particular target. Some of the ML models make specific assumptions about the distribution of features and target variables.
I guess I'm looking for answers to the why are these specific statistical methods important. How should I be integrating them into my practices?
--Tom