Ask Me Anything on DSS 7 with Sunny Porinju

sunnyporinju
sunnyporinju Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 5 Dataiker

Hi Dataiku Community! This is Sunny and I am here to answer your questions about Dataiku DSS 7. We recently launched DSS 7 with a number of new features to help improve your data journey.

I know you may have a lot of questions about this release (and maybe future releases) so I will do my best between now and April 3rd to answer all of your questions and hear all of your feedback.

Before getting started, if you are not familiar with DSS 7 I ask that you review this blog post that provides a high level overview, as well as the DSS 7.0 Release notes. Also if you are new to AMA’s, please review the Ask Me Anything Guidelines and the Dataiku Community Guidelines.

To participate, simply hit reply, and craft your question. Be sure to tag me, @sunnyporinju
, so I can be notified of your post. I’ll be keeping an eye out as well, so not to worry if you forget to tag me. (But it’s good practice!)

Let the questions begin!

IMG_0664.jpg Sunny Porinju is a Senior Product Marketing Manager at Dataiku.

Best Answer

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    Answer ✓

    Hi @tgb417
    ,


    The new git integration for projects allows you to synchronize your DSS project with any git-based-hosted solution including GitHub, GitLab or Bitbucket.
    That being said, heavy artifacts like datasets or saved models can't be versioned in Git, so the Git integration can't be used for the purpose of sharing a project across multiple DSS instances. The appropriate way to go for this purpose is to use project exports.
    However, once the project has been synced with a remote git repository, the source code can be cloned outside of DSS, so other people can edit the code recipes or the project libraries from any code editor and then push the changes to the remote Git repository that DSS will then be able to pull to update the project.


    The project-libraries-level git integration doesn't allow to push changes to a remote git repository. For now, it is only possible through the project-level git integration. Thus so far it's only possible to leverage the DSS git integration for importing libraries from open source projects, but it's not possible to use DSS for contributing.

Answers

  • Sean
    Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker

    @sunnyporinju
    Can you guys comment on the “how” of the localized feature importance? Does it use LIME, or Shapley values? Is this available when Spark is the execution engine? Or is it just applicable in local execution mode?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    Can I use the new git integration to share projects and libraries through git hub and other git-ish tools like bitbucket?

    I would like to build DSS projects in one instance of DSS and have colleagues at other non-profits be able to pick up the project via GitHub or bitbucket.

    I also contribute to a few open-source projects. Can I use the git integration to allow me to develop a python library in my instance of DSS and post my pull request to GitHub?

  • sunnyporinju
    sunnyporinju Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 5 Dataiker

    @SeanA
    Thanks for the question. For Individual Prediction Explanations: It is either based on ICE (Individual Conditional Expectations) or based on Shapley values.

    It is run in-memory. As other ‘in memory’ recipes, it can be executed in a container.

  • casper
    casper Registered Posts: 42 ✭✭✭✭

    Can we get an update when Pandas will be updated? In 2018 it had an indication it would be "nearby".

  • sunnyporinju
    sunnyporinju Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 5 Dataiker

    Hi @casper
    ,

    We did update Pandas in 2018. We are looking into the next update

  • taraku
    taraku Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 53 Dataiker

    Hi @sunnyporinju
    my question is: For SharePoint support, what does DSS ingest?

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    Is the library level sharing with the use case of contributing to open source under consideration?

  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker

    Yes, the ability to commit and push to git repositories, at the project libraries level, is being considered. This would enable users to contribute to any open source libs directly from DSS.

  • GCase
    GCase Dataiker, PartnerAdmin, Registered Posts: 27 Dataiker

    @sunnyporinju
    a question from a prospect who saw the new Statistics component.
    "I see that you are doing univariate and bivariate analysis and those are pretty simple. I'm interested to know on the Statistical Tests, PCA, Fit Curves, and Correlation Matrix did Dataiku leverage specific packages to do these (like you have done with Python Sci-Kit for ML) or did Dataiku implement these on their own?"

  • sunnyporinju
    sunnyporinju Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 5 Dataiker

    Hi @taraku
    ,

    The SharePoint support gives DSS users read and write access for files and lists on SharePoint.

  • FredericT
    FredericT Dataiker Posts: 3 Dataiker

    Hi @GCase
    ,

    Statistical features are built using well-known Python packages such as scipy, scikit-learn, numpy and statsmodels.

  • bencoffey
    bencoffey Dataiker Alumni, Registered Posts: 5 ✭✭✭✭✭

    Hi @sunnyporinju
    ,

    For the interpretability of models, is it working for all visual models?

  • sunnyporinju
    sunnyporinju Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 5 Dataiker

    Hi @bencoffey
    ,

    Yes, it works for all visual models.

  • CoreyS
    CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭

    Just a reminder that you have until this Friday, April 3rd to get all of your questions in!

Setup Info
    Tags
      Help me…