Ask Me Anything on DSS 7 with Sunny Porinju

sunnyporinju · March 2020

Hi Dataiku Community! This is Sunny and I am here to answer your questions about Dataiku DSS 7. We recently launched DSS 7 with a number of new features to help improve your data journey.

I know you may have a lot of questions about this release (and maybe future releases) so I will do my best between now and April 3rd to answer all of your questions and hear all of your feedback.

Before getting started, if you are not familiar with DSS 7 I ask that you review this blog post that provides a high level overview, as well as the DSS 7.0 Release notes. Also if you are new to AMA’s, please review the Ask Me Anything Guidelines and the Dataiku Community Guidelines.

To participate, simply hit reply, and craft your question. Be sure to tag me, @sunnyporinju
, so I can be notified of your post. I’ll be keeping an eye out as well, so not to worry if you forget to tag me. (But it’s good practice!)

Let the questions begin!

Sunny Porinju is a Senior Product Marketing Manager at Dataiku.

dimitri · March 2020

Hi @tgb417
,

The new git integration for projects allows you to synchronize your DSS project with any git-based-hosted solution including GitHub, GitLab or Bitbucket.
That being said, heavy artifacts like datasets or saved models can't be versioned in Git, so the Git integration can't be used for the purpose of sharing a project across multiple DSS instances. The appropriate way to go for this purpose is to use project exports.
However, once the project has been synced with a remote git repository, the source code can be cloned outside of DSS, so other people can edit the code recipes or the project libraries from any code editor and then push the changes to the remote Git repository that DSS will then be able to pull to update the project.

The project-libraries-level git integration doesn't allow to push changes to a remote git repository. For now, it is only possible through the project-level git integration. Thus so far it's only possible to leverage the DSS git integration for importing libraries from open source projects, but it's not possible to use DSS for contributing.

Sean · March 2020

@sunnyporinju
Can you guys comment on the “how” of the localized feature importance? Does it use LIME, or Shapley values? Is this available when Spark is the execution engine? Or is it just applicable in local execution mode?

tgb417 · March 2020

Can I use the new git integration to share projects and libraries through git hub and other git-ish tools like bitbucket?

I would like to build DSS projects in one instance of DSS and have colleagues at other non-profits be able to pick up the project via GitHub or bitbucket.

I also contribute to a few open-source projects. Can I use the git integration to allow me to develop a python library in my instance of DSS and post my pull request to GitHub?

sunnyporinju · March 2020

@SeanA
Thanks for the question. For Individual Prediction Explanations: It is either based on ICE (Individual Conditional Expectations) or based on Shapley values.

It is run in-memory. As other ‘in memory’ recipes, it can be executed in a container.

casper · March 2020

Can we get an update when Pandas will be updated? In 2018 it had an indication it would be "nearby".

sunnyporinju · March 2020

Hi @casper
,

We did update Pandas in 2018. We are looking into the next update

taraku · March 2020

Hi @sunnyporinju
my question is: For SharePoint support, what does DSS ingest?

tgb417 · March 2020

Is the library level sharing with the use case of contributing to open source under consideration?

dimitri · March 2020

Yes, the ability to commit and push to git repositories, at the project libraries level, is being considered. This would enable users to contribute to any open source libs directly from DSS.

GCase · March 2020

@sunnyporinju
a question from a prospect who saw the new Statistics component.
"I see that you are doing univariate and bivariate analysis and those are pretty simple. I'm interested to know on the Statistical Tests, PCA, Fit Curves, and Correlation Matrix did Dataiku leverage specific packages to do these (like you have done with Python Sci-Kit for ML) or did Dataiku implement these on their own?"