[SOLVED] Reshape from wide to long with multiple columns that have different naming patterns

Issue

This Content is from Stack Overflow. Question asked by ryorlets

I have a longitudinal data set in wide format, with > 2500 columns. Almost all columns begin with ‘W1_’ or ‘W2_’ to indicate the wave (ie, time point) of data collection. In the real data, there are > 2 waves. They look like this:

# Populate wide format data frame
person <- c(1, 2, 3, 4)
W1_resp_sex <- c(1, 2, 1, 2)
W2_resp_sex <- c(1, 2, 1, 2)
W1_edu <- c(1, 2, 3, 4)
W2_q_2_1 <- c(0, 1, 1, 0)

wide <- as.data.frame(cbind(person, W1_resp_sex, W2_resp_sex, W1_edu, W2_q_2_1))
wide
#>   person W1_resp_sex W2_resp_sex W1_edu W2_q_2_1
#> 1      1           1           1      1        0
#> 2      2           2           2      2        1
#> 3      3           1           1      3        1
#> 4      4           2           2      4        0

I need to reshape from wide to long format. I tried pivot_longer(). How do I fix these issues?
(Note: I prefer not to use data.table.)

  1. The variables have different naming patterns (How can I correctly specify names_pattern() ?)
  2. The multiple columns (see how all values are under the ‘sex’ column)
  3. Creating a column with ‘NA’ when a variable was only collected in one wave (ie, if it was only collected in wave 2, I want a column with W1_varname in which all values are NA).
# Re-load wide format data
person <- c(1, 2, 3, 4)
W1_resp_sex <- c(1, 2, 1, 2)
W2_resp_sex <- c(1, 2, 1, 2)
W1_edu <- c(1, 2, 3, 4)
W2_q_2_1 <- c(0, 1, 1, 0)
wide <- as.data.frame(cbind(person, W1_resp_sex, W2_resp_sex, W1_edu, W2_q_2_1))

# Load package
pacman::p_load(tidyr)

# Reshape from wide to long 
long <- wide %>%
  pivot_longer(
    cols = starts_with('W'),
    names_to = 'Wave',
    names_prefix = 'W',
    names_pattern = '(.*)_',
    values_to = 'sex',
    values_drop_na = TRUE
  )
long
#> # A tibble: 16 × 3
#>    person Wave     sex
#>     <dbl> <chr>  <dbl>
#>  1      1 1_resp     1
#>  2      1 2_resp     1
#>  3      1 1          1
#>  4      1 2_q_2      0
#>  5      2 1_resp     2
#>  6      2 2_resp     2
#>  7      2 1          2
#>  8      2 2_q_2      1
#>  9      3 1_resp     1
#> 10      3 2_resp     1
#> 11      3 1          3
#> 12      3 2_q_2      1
#> 13      4 1_resp     2
#> 14      4 2_resp     2
#> 15      4 1          4
#> 16      4 2_q_2      0

Created on 2022-09-18 by the reprex package (v2.0.1)



Solution

You want to reshape the variables that are measured in both waves. You may find them tableing the substring of the names without prefix.

v <- grep(names(which(table(substring(names(wide)[-1], 4)) == 2)), names(wide))
reshape2::melt(data=wide, id.vars=1, measure.vars=v)
#   person    variable value
# 1      1 W1_resp_sex     1
# 2      2 W1_resp_sex     2
# 3      3 W1_resp_sex     1
# 4      4 W1_resp_sex     2
# 5      1 W2_resp_sex     1
# 6      2 W2_resp_sex     2
# 7      3 W2_resp_sex     1
# 8      4 W2_resp_sex     2


This Question was asked in StackOverflow by ryorlets and Answered by jay.sf It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?