[SOLVED] How to combine function argument with group_by in R

Issue

This Content is from Stack Overflow. Question asked by Letícia Lara

I would like to use group_by( ) function with my customised function but the column names that goes within group_by would be defined in my function argument.

See a hypothetical example of what my data would look like:

data <- data.frame(ind = rep(c("A", "B", "C"), 4),
                   gender = rep(c("F", "M"), each = 6), 
                   value = sample(1:100, 12))

And this is the result I would like to have:

result <- data %>%
   group_by(ind, gender) %>%
   mutate(value = mean(value)) %>%
   distinct()

This is how I was trying to make my function to work:

myFunction <- function(data, set_group, variable){
   result <- data %>%
      group_by(get(set_group)) %>%
      mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
      distinct()
}

result3 <- myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
result3

I want to allow that the user define as many set_group as needed and as many variable as needed. I tried using get( ) function, all_of( ) function and mget( ) function within group_by but none worked.
Does anyone know how can I code it?

Thank you!



Solution

We could use across within group_by

myFunction <- function(data, set_group, variable){
    data %>%
      group_by(across(all_of(set_group))) %>%
      mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
      ungroup %>%
      distinct() 
}

-testing

> myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
# A tibble: 6 × 3
  ind   gender value
  <chr> <chr>  <dbl>
1 A     F       43.5
2 B     F       87.5
3 C     F       67.5
4 A     M       13  
5 B     M       43.5
6 C     M       37.5

Another option is to convert to symbols and evaluate (!!!)

myFunction <- function(data, set_group, variable){
    data %>%
      group_by(!!! rlang::syms(set_group)) %>%
      mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
      ungroup %>%
      distinct() 
}

-testing

> myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
# A tibble: 6 × 3
  ind   gender value
  <chr> <chr>  <dbl>
1 A     F       43.5
2 B     F       87.5
3 C     F       67.5
4 A     M       13  
5 B     M       43.5
6 C     M       37.5

NOTE: get is used when there is a single object, for multiple objects mget can be used. But, it is better to use tidyverse functions


This Question was asked in StackOverflow by Letícia Lara and Answered by akrun It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?