Finding Function Calls in R Code

Yihui Xie 2023-01-01

Yesterday I learned an unexpected but interesting use of the highr package from a GitHub issue. This package is intended for syntax highlighting R code, but the user wanted to identify function calls from given R code. What he did was to first syntax highlight the code, and then look for LaTeX code \kwd{} in the result. I told him that this task could be done with getParseData(), but there were a few edge cases. For example:

getParseData(parse(text = c('lapply(1:10, paste)')))
   line1 col1 line2 col2 id parent                token   text
1      1    1     1    6  1      3 SYMBOL_FUNCTION_CALL lapply
2      1    7     1    7  2     20                  '('      (
4      1    8     1    8  4      5            NUM_CONST      1
6      1    9     1    9  6     10                  ':'      :
7      1   10     1   11  7      8            NUM_CONST     10
9      1   12     1   12  9     20                  ','      ,
14     1   14     1   18 14     16               SYMBOL  paste
15     1   19     1   19 15     20                  ')'      )

In this case, lapply was correctly identified as SYMBOL_FUNCTION_CALL, but paste was not (instead, it was identified as a SYMBOL). We can try to evaluate the symbol and check if it is a function:

find_funs = function(code) {
  d = getParseData(parse(text = code))
  f = d[d$token == 'SYMBOL_FUNCTION_CALL', 'text']
  for (s in d[d$token == 'SYMBOL', 'text']) {
    tryCatch({
      ev = eval(as.symbol(s), parent.frame())
      if (is.function(ev)) f = c(f, s)
    }, error = function(e) NULL)
  }
  f
}

Then find_funs('lapply(1:10, paste)') can find both lapply and paste.

One caveat is that this approach doesn’t evaluate the code but simply parses it, so it won’t be able to recognize functions in add-on packages by default. One way to address this problem is to detect library() or require() calls and search for possible function names in packages. This won’t be totally robust (e.g., for the case library(x, character.only = TRUE)). Another way is to actually evaluate the code before trying to detect if a symbol is a function, which is more expensive.

Another edge case is function calls in glue::glue(), e.g., str_to_title and as.character in the following case:

glue("This number is {str_to_title(as.character(123))}")

We can certainly detect glue calls and try to parse the glue templates. I’m not interested in going that far, so I’ll just stop here.

Besides getParseData(), I guess codetools::makeCodeWalker() might also work. I first learned about it from Kohske ten years ago.