DSS plugin wishlist
Hi everyone,
Like many of you I sometimes create DSS plugins for my projects, tackling specific business problems. I often wonder if any of these things would be useful to others, so thought it might be fun to start this plugin wish list where we can all share ideas for plugins that would make our lives easier, and potentially the lives of lots of other data scientists and analysts out there using DSS!
Who knows, if several of us feel that one particular plugin/concept would be useful we could try and write it collaboratively, then share it with the whole community.
To kick things off, something I have been thinking of working on is a plugin that handles data imports into Google Analytics. I use GA a lot in my work, and one feature of the product allows you to upload your own data into it - for example results of an ML model built in DSS. I am currently using another piece of software to do this, but would love to bring this functionality into DSS.
There is a Google API for data imports, so I should be able to write some python code to handle this process. Parameters would be required for the source dataset, the destination dataset (inside Google Analytics) and some identifiers for the GA account.
Would anyone else find this useful?
So, over to you - which plugin would make your DSS experience (and that of others!) even more awesome?
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
Right now I find a google analytics upload plugin to be interesting. However, right now I don’t personally have a need.
What kind of data do you see as useful to upload? Maybe can you share a bit more about the use case.
I’ll have a think about an example I might find useful. The idea of working together on a plugin is interesting to me as a learning / development opertunity.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
I was thinking about this subject today. I really would like a flexible record linkage / record deduplication plugin.
This plugin would not do simplistic exact matches or even single column soundex style matches. It would use probabilistic record matching on multiple incomplete columns in order to reduce the duplicate count in data sets that I confront. Typically for customer records entered by a variety of folks over time and through a number of different channels.
It might implement the Python Record Linkage Toolkit, there are other library tools out there as well.
This article seems to give a reasonable description of the challenge.
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
We don't have a prioritized list, but we want to use or develop plugins for:
- Jira
- Smartsheet
- InfluxDB
If anybody has something in those areas, it would be great if you could share! If not, I'll let you know when we have something in those areas.
-
Hi @ben_p
,I've actually been working on a Google Analytics plugin, in my case to fetch data into a DSS dataset. Here's what I have so far: https://github.com/nmartorell/google-analytics
I'm relatively new to Google Analytics, and wasn't aware that is was possible to upload your own data to it. It see now that this is possible via the Google APIs, so I'll look into adding this functionality to the plugin (although feel free to try it out as well!).
I'll keep you posted!
Ned
-
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
Hey @NedM
that looks like a great plugin!I've managed to put together an internal plugin for GA data imports this week. I'm going to work on cleaning up a public version of the code, then we can see if we think it would be a cool feature to integrate into your plugin!
-
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
Hey @Ignacio_Toledo
,I can see a Jira plugin being useful to wide audience! What would be the key things you would want such a plugin to do?
Ben
-
ben_p Neuron 2020, Registered, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant Posts: 143 ✭✭✭✭✭✭✭
Interesting @tgb417
! This is definitely a plugin that I would use, I've tried to do some fuzzy matching in DSS but from memory I could only use the built-in solution on a small dataset, so something which could be applied at scale would be great! -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
@ben_p
,My understanding of how to do record linkage at scale, says that you have to do a bit of blocking. That is grouping the records you actually want to compare. Rather than compare every record directly with every other record. There are a variety of approaches to this challenge that you can use so that you do not have to match every record to every other record. Which as you point out would be a real scaling challenge. Here is a list supported by the Python Record Linkage Toolkit:
- recordlinkage.index.Full,
- recordlinkage.index.Block,
- recordlinkage.index.SortedNeighbourhood
How might we work together on this challenge/opportunity?
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
The key things are:
- Extract the history of one or more Jira Tickets: transitions, assignee, etc.
- Access the work log data.
- Get the current status of one or more Jira Tickets.
The plugin will use the user input to filter the results according to their needs.
The outputs are datasets that can be used not only for creating charts (a feature that Jira has) but to enrich them with other datasets coming from other sources (like smartsheet, CMMS, confluence, software logs in elasticsearch, etc.)
Apparently there is also a need to automate the production of reports and charts than Jira can produce. I'm not expert in Jira, but people says they can't find an easy way to automatically re-generate and export the Charts or reports they need to produce weekly or monthly.
Cheers!
-
@NedM
Hi, can we discuss the plugin you developed offline? I am also interested in pulling data our company's GA data.Thanks.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron