User Highlight - Mani

CoreyS · ‎07-22-2020

A community thrives on the basis of its great members - so let's take a look at some of them shall we? From time to time we will be highlighting a prominent member of the Community, and sharing their story and DSS accomplishments!

Meet Mani - known here as @neomatrix369. We sat down with him and had a chat about his story with Dataiku DSS.

1) How did you find Dataiku and get started with DSS?

I got to know about Dataiku in about 2018 when I heard someone else using it talk about it. And then I decided to download it and play with it. Although I really started working with it in 2019 and further in 2020. At that time I was also learning about various AI/ML/DL topics particularly about data and so data related tools became my primary focus and then I started gathering information about them on this page and it led to creating a dedicated page for Dataiku. And that led me to also attend a number of events and talks organised by the Dataiku Community team, including the EGG London conference in London, UK in 2019.

I was using DSS regularly as the Free/Community edition is quite elaborate that I could do many of my analysis and experiments in it and get quick results.

And this also helped during the time when I was learning about data topics and into diving deep into things like Data Preparation, Data Cleaning, Feature Engineering, and the likes. DSS offers many such features that help a Data Scientist/Machine Learning Engineer accomplish their tasks easily.

I wanted to learn how to do these things and know in-depth about these topics - I do also recollect that while doing this I was also preparing for a talk on data which you can find here. So my quest for practical knowledge led to looking for tools, learning them, using them and writing about them on my github repo.

One of the other reasons I got interested in DSS is that it is Java/JVM based and supports multiple programming languages when creating notebooks or plugins.

2) What's your favorite DSS feature?

I like the summarized visualizations (also the Visual Exploratory Data Analysis section) shown when a dataset is loaded. You get two views of it, one at the column level and the other on the table level. I have seen in the latest version (version 7.0) this is expanded into a separate section with lots more visualizations. But I have a few more favorites in DSS which I often look at for another perspective about the dataset I’m using:

The Lab section is great

helps create quick models for validation purposes

AutoML wizard is also very useful
Post model training analysis sections

3) Tell us about your projects!

I’m working on multiple projects but to name a few:

https://github.com/neomatrix369/awesome-graal: gathers curated resources about GraalVM: a polyglot JVM from Oracle Labs. Covers a lot of resources related to polyglot programming.
https://github.com/neomatrix369/awesome-ai-ml-dl: a GitHub repo of curated links gathered from various sources and shared with the community. There are code snippets, examples, notebooks, presentations, and many other learning materials available about many topics AI/ML/DL.

I also created an example on how to run DSS via a Docker container on GraalVM, see this example

Better NLP library: https://bit.ly/better-nlp-launch - this is a project that sprung out of the above project (https://github.com/neomatrix369/awesome-ai-ml-dl) while read a book called Machine Learning is Fun.

I recently developed an NLP library which will become a part of the Better NLP library eventually, it’s called NLP Profiler - what is it? Think of the pandas’s describe() but for analysing a text column. Pandas’s describe() only works on numeric columns, extracting descriptive statistics about the numerical data in the various columns of the dataset but there isn’t anything available to generate the same for text data in this manner. And so I went ahead and wrote one, although at the moment it does only basic analysis and not yet equipped to handle data at scale among many other small features I’d like to add. But it is work in progress, as I have been using it for multiple occasions, one of them being at a Kaggle task, see my kernel.

Below are a couple of private DSS projects I have been working on when competing in online DS/ML competitions like the below:

Profile pic 1.png

DSS helped me create submissions from it, and compare my other submissions created manually or with the help of another colleague working with me on this competition. We found that for the specific dataset, our results didn’t differ much but DSS offered a more systematic way to load data, process, setup the model, train the model, and generate the submission dataset real quickly. It was great to see how on a low-spec laptop, DSS would still seamlessly run through everything without cranking up like if we did the same thing via a Jupyter notebook (which it is not meant to be used for anyways).

March 2020: Liverpool Ion Switching

This project involves time-series data. I ended up creating a simple ensemble model mainly composed of tree-based models.

I’m also involved in two other projects:

Virgilio: write and review guides on how to get started with various topics and areas of AI/ML/DL (see the Github repo: https://github.com/virgili0/Virgilio). Also cross-linking resources between Virgilio and https://github.com/neomatrix369/awesome-ai-ml-dl.
MWML: this project overlaps with my own project i.e. https://github.com/neomatrix369/awesome-ai-ml-dl, and has a wider and active community and the links shared there are also managed by the community. We are also cross-linking resources between MWML and https://github.com/neomatrix369/awesome-ai-ml-dl.

I also gave two talks back to back during the end of June and beginning of July:

https://bit.ly/nn-things-java-dev-ai-ml-dl (video will be out soon)
https://bit.ly/backend-dev-to-ml-slides

These talks cover a bigger picture of the AI/ML/DL world I live in and my journey and also my perspectives. And show how I focus on tool development, problem-solving, learning and many better practices and techniques for both Software Engineer and Data Scientists and Machine Learning Engineers.

Do you have a Dataiku story? Share how you came to use Dataiku DSS or an interesting goal you accomplished with it!

neomatrix369 · ‎07-22-2020

Thank DSS Community team, it's a privilege to be highlighted here 🎉

tgb417 · ‎07-22-2020

@neomatrix369

Great to meet you. Love the use of github to self document and share your data science journey. Have you completed the fast.ai program? If so how did it go?

—Tom

neomatrix369 · ‎07-22-2020

@tgb417 Tom, thanks for your feedback and I'm glad you like what I shared.

I was going to start working on the fast.ai course but TBH got distracted although it's on my list and I will get there eventually. But in the meanwhile got a good idea and feel for their philosophy and how it overlaps with mine.

So not an answer to your question, but when I will know I will share.

MichaelG · ‎07-22-2020

@neomatrix369 our pleasure! And a pleasure to have you here in the Community with us.

User Highlight - Mani

Labels

Discover the Winning Stories of the 2023 Dataiku Frontrunner Awards

Top Community Contributors - June 2023

Top Community Contributors - May 2023