Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hey DSS Community!
I would like to know how to remove the duplicate number of lines and taking into account two reference columns?
It seems to me that it is necessary to use a python formula but I am not sufficiently trained in this language. Is this the only possibility?
Thanks in advance: D
Depending on what kind of result you're trying to achieve you may try:
- "Distinct" recipe which allows to select the columns that you want to be distinct. In this case the output schema will only consist of the columns you chose
- "Group" recipe where you group on selected keys and chose for example "First" for the rest of the columns. In this case the output schema will contain all of the columns.
Thanks for your feedback!
I think the "Dinstict" recipe is what I want to do.
I had already tried it and when I left it, the recipe overloaded a lot of data from my dataset.
I tried without adding anything and adding filters but the results are the same 😞
Hi @Amarilla ,
I find @Andrey 's "Group by" suggestion very helpful for removing and doublechecking duplicates. Group on the column that needs to hold distinct values, the "concat" option gives you a column in which reference-values can be concatenated.. Make sure to set a suitable separator and possibly check "concat distinct". See the attached picture for a clue where to find those options.
Hope this helps!
Hi @Amarilla ,
Last screenshot was part of data i cannot share here, so a better one is attached now. It shows the screen you see when setting up the GroupBy-processor. Under "Per Field aggregations" the columns of the dataset show up. As you can see i selected a number of columns by activating the Avg-option, those will show up in the resulting dataset.
Because grouping is a form of aggregation, you should specify what to do with the values in your columns. @Andrey suggested "First" , which just returns the first value the processor encounters. It depends on your specific data-wishes what to select here, so play around with it a little to get the result you want. After all : you can always go back and change what you specified earlier.
Hope this helps, have fun!