I want to keep only data from the first occurring row when there are duplicates in certain column within the same row.
Suppose we have a dataframe like this:
df <- data.frame(
A = c(99, 99, 99, 100, 101),
B = c("apple", "orange", "apple", "banana", "apple"),
C = c("eaten", "eaten", "fresh", "eaten", "fresh")
) # create sample dataframe
I want to drop the second and third rows because I only want to keep the first record for cases duplicated with 99, while retaining the rest.
This is how the code looks like with the use of distinct(), where the duplicates are removed after the first row of its occurrence.
df_unique <- df %>%
distinct(A, .keep_all = TRUE) # Remove duplicate rows based on A column