I am trying to sync hdfs dataset to MS sql server dataset where the rows are appended whenever there is a change is dataset using a scenario. while doing it i want an incident id generated so that a ticket can be tracked over the incident. and table will be used again for ML feedback.
I am stuck at creating a way to distinct the ticket. can some one help me. and can i also have a Dataiku expert to help me out with few new approaches that i am trying out to do in Dataiku.
Hi @sasidharp. There could be so many solutions to your problem, that a little bit more or context could be useful.
For example, does you hdfs dataset contains any column or a group of columns, that could be used as indexes? For example, you might have column with a timestamp, and that timestamp is unique for all rows, then you could use that column to create a unique index.
Or maybe you have duplicate timestamps, but you have multiple categories, and there are no rows where both the timestamp and the category are equal: then you could use both columns to create a unique index.
It might as well be very likely that there is a DSS function or feature that allows you to do exactly what you want, and I don't know it, but this is how I would try to solve the problem you describe.
I hope this helps! And if there is a better solution, I think others will post more information.