Yesterday I learned an unexpected but interesting use of the highr package
from a GitHub issue. This package is
intended for syntax highlighting R code, but the user wanted to identify
function calls from given R code. What he did was to first syntax highlight the
code, and then look for LaTeX code \kwd{}
in the result. I told him that this
task could be done with getParseData()
, but there were a few edge cases. For
example:
getParseData(parse(text = c('lapply(1:10, paste)')))
line1 col1 line2 col2 id parent token text
1 1 1 1 6 1 3 SYMBOL_FUNCTION_CALL lapply
2 1 7 1 7 2 20 '(' (
4 1 8 1 8 4 5 NUM_CONST 1
6 1 9 1 9 6 10 ':' :
7 1 10 1 11 7 8 NUM_CONST 10
9 1 12 1 12 9 20 ',' ,
14 1 14 1 18 14 16 SYMBOL paste
15 1 19 1 19 15 20 ')' )
In this case, lapply
was correctly identified as SYMBOL_FUNCTION_CALL
, but
paste
was not (instead, it was identified as a SYMBOL
). We can try to
evaluate the symbol and check if it is a function:
find_funs = function(code) {
d = getParseData(parse(text = code))
f = d[d$token == 'SYMBOL_FUNCTION_CALL', 'text']
for (s in d[d$token == 'SYMBOL', 'text']) {
tryCatch({
ev = eval(as.symbol(s), parent.frame())
if (is.function(ev)) f = c(f, s)
}, error = function(e) NULL)
}
f
}
Then find_funs('lapply(1:10, paste)')
can find both lapply
and paste
.
One caveat is that this approach doesn’t evaluate the code but simply parses it,
so it won’t be able to recognize functions in add-on packages by default. One
way to address this problem is to detect library()
or require()
calls and
search for possible function names in packages. This won’t be totally robust
(e.g., for the case library(x, character.only = TRUE)
). Another way is to
actually evaluate the code before trying to detect if a symbol is a function,
which is more expensive.
Another edge case is function calls in glue::glue()
, e.g., str_to_title
and
as.character
in the following case:
glue("This number is {str_to_title(as.character(123))}")
We can certainly detect glue
calls and try to parse the glue templates. I’m
not interested in going that far, so I’ll just stop here.
Besides getParseData()
, I guess codetools::makeCodeWalker()
might also work.
I first learned about it from Kohske ten years
ago.