Duplicate rows need to remove or replace value

ESoto
Level 3
Duplicate rows need to remove or replace value

My rows are repeating information over and over again because I now have two columns that have a computer name. The one column has different computer names (information from another database) and because of this it is duplicating the results to put in a value for the computer names that are different. 

I need to get rid of the duplicates (it keeps repeating for example every 3 lines so if a user has only 3 applications, they are using it is repeating over and over again). I cannot paste any screenshots for security reasons. But here is an example for one user (the other database shows all devices user has but does not know about the applications used):

computer name    device name     application                     usage session

sseisssss              sseisssss           photoshop 2023                3

                              xdeddede          photoshop 2023                3

Basically, because there is a different computer name it will repeat that the user has photoshop 2023 when I do not want this repeat and since it is repeating it is also duplicating how many sessions the user is using the application. I tried the distinct recipe, but I need to keep all the rows and columns and it only outputs the distinct column. I am also not sure how to accomplish this in the group recipe.

 

I do see that there is a Python Function option to remove the duplicates of one column, how would I do it for all columns? Any help would be appreciated as I cannot move on to complete my analysis without solving this, thank you. 


Operating system used: Windows 11 Enterprise

0 Kudos
4 Replies
Turribeach

I don't really know how you want to deduplicte this data. But assuming you want to keep the first row you can do a max(computer name), remove device name as a column and group by all the other columns to get a distinct value. A group by recipe should do this.

0 Kudos
ESoto
Level 3
Author

I would rather not remove the device name column because I need this to show that there are possibly other devices not being accounted for that should be in the future. So I really just want to remove the duplicates in concerns of the applications and the usage sessions but the column needs to stay put.

0 Kudos
Turribeach

Show us how you expect the deduplicated row to look. What exact values and columns do you expect to see how would you have a row with two device names.

0 Kudos
ESoto
Level 3
Author

I basically would need this, so it is actually accurate to the information:

 

computer name    device name     application                     usage session

sseisssss              sseisssss           photoshop 2023                3

                              xdeddede          

 

or if it can not get rid of the duplicate then I need the 3 replaced with a different value:

 

computer name    device name     application                     usage session

sseisssss              sseisssss           photoshop 2023                3

                              xdeddede          photoshop 2023                N/A

 

That way it can keep the information that the user has multiple devices but is not comprising the integrity of the data. I do not mind having to manually change the 3 to N/A for each part I see it repeating just would also need a solution to do this besides exporting and doing it in Excel. 

0 Kudos