Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

DSS plugin wishlist

Neuron
Neuron
DSS plugin wishlist

Hi everyone,

Like many of you I sometimes create DSS plugins for my projects, tackling specific business problems. I often wonder if any of these things would be useful to others, so thought it might be fun to start this plugin wish list where we can all share ideas for plugins that would make our lives easier, and potentially the lives of lots of other data scientists and analysts out there using DSS!

Who knows, if several of us feel that one particular plugin/concept would be useful we could try and write it collaboratively, then share it with the whole community.

To kick things off, something I have been thinking of working on is a plugin that handles data imports into Google Analytics. I use GA a lot in my work, and one feature of the product allows you to upload your own data into it - for example results of an ML model built in DSS. I am currently using another piece of software to do this, but would love to bring this functionality into DSS.

There is a Google API for data imports, so I should be able to write some python code to handle this process. Parameters would be required for the source dataset, the destination dataset (inside Google Analytics) and some identifiers for the GA account.

Would anyone else find this useful? 🙂

So, over to you - which plugin would make your DSS experience (and that of others!) even more awesome?

 

9 Replies
Neuron
Neuron

@ben_p 

Right now I find a google analytics upload plugin to be interesting.  However, right now I don’t personally have a need.

What kind of data do you see as useful to upload?  Maybe can you share a bit more about the use case.

I’ll have a think about an example I might find useful.  The idea of working together on a plugin is interesting to me as a learning / development opertunity. 

--Tom
Neuron
Neuron

@ben_p 

I was thinking about this subject today.  I really would like a flexible record linkage / record deduplication plugin.

This plugin would not do simplistic exact matches or even single column soundex style matches.  It would use probabilistic record matching on multiple incomplete columns in order to reduce the duplicate count in data sets that I confront.  Typically for customer records entered by a variety of folks over time and through a number of different channels.

It might implement the Python Record Linkage Toolkit, there are other library tools out there as well. 

This article seems to give a reasonable description of the challenge.  

--Tom
0 Kudos
Neuron
Neuron
Author

Interesting @tgb417! This is definitely a plugin that I would use, I've tried to do some fuzzy matching in DSS but from memory I could only use the built-in solution on a small dataset, so something which could be applied at scale would be great!

0 Kudos
Neuron
Neuron

@ben_p ,

My understanding of how to do record linkage at scale, says that you have to do a bit of blocking.  That is grouping the records you actually want to compare. Rather than compare every record directly with every other record. There are a variety of approaches to this challenge that you can use so that you do not have to match every record to every other record.  Which as you point out would be a real scaling challenge.  Here is a list supported by the Python Record Linkage Toolkit:

  • recordlinkage.index.Full,
  • recordlinkage.index.Block,
  • recordlinkage.index.SortedNeighbourhood

How might we work together on this challenge/opportunity?

--Tom
0 Kudos

We don't have a prioritized list, but we want to use or develop plugins for:

  • Jira
  • Smartsheet
  • InfluxDB

If anybody has something in those areas, it would be great if you could share! If not, I'll let you know when we have something in those areas.

0 Kudos
Neuron
Neuron
Author

Hey @Ignacio_Toledo ,

I can see a Jira plugin being useful to wide audience! What would be the key things you would want such a plugin to do?

Ben

0 Kudos

The key things are:

  • Extract the history of one or more Jira Tickets: transitions, assignee, etc.
  • Access the work log data.
  • Get the current status of one or more Jira Tickets.

The plugin will use the user input to filter the results according to their needs.

The outputs are datasets that can be used not only for creating charts (a feature that Jira has) but to enrich them with other datasets coming from other sources (like smartsheet, CMMS, confluence, software logs in elasticsearch, etc.)

Apparently there is also a need to automate the production of reports and charts than Jira can produce. I'm not expert in Jira, but people says they can't find an easy way to automatically re-generate and export the Charts or reports they need to produce weekly or monthly.

Cheers!

0 Kudos
Dataiker
Dataiker

Hi @ben_p

I've actually been working on a Google Analytics plugin, in my case to fetch data into a DSS dataset. Here's what I have so far: https://github.com/nmartorell/google-analytics

I'm relatively new to Google Analytics, and wasn't aware that is was possible to upload your own data to it. It see now that this is possible via the Google APIs, so I'll look into adding this functionality to the plugin (although feel free to try it out as well!).

I'll keep you posted!

Ned

Neuron
Neuron
Author

Hey @NedM that looks like a great plugin!

I've managed to put together an internal plugin for GA data imports this week. I'm going to work on cleaning up a public version of the code, then we can see if we think it would be a cool feature to integrate into your plugin!

0 Kudos