Friday, June 28, 2013

Testing function arguments in GNU R

Recently I have read a nice post on ensuring that proper arguments are passed to a function using GNU R class system. However, I often need a more lightweight solution to repetitive function argument testing.
The alternative idea is to test function arguments against a specified pattern given in a string. The pattern I use has the form:
([argument type][required argument length])+.
The allowed argument types are:
  b     logical
  i     integer
  d     double
  n     numeric
  c     complex
  s     character
  f     function
  l     list
  a     any type
and argument lengths are:
  ?     0 or 1
  *     0 or more
  +     1 or more
  n     exactly n

So for example pattern "n2f?s+" means that function requires three arguments: one numeric vector of length 2, a function (or nothing) and character vector of any positive length.

The function that performs checking of the arguments against the pattern is given here:

is.valid <- function(rule, ..., empty.null = TRUE) {
    err <- function(type, position, required, found) {
        e <- FALSE
        attr(e, "type") <- type
        attr(e, "position") <- position
        attr(e, "required") <- required
        attr(e, "found") <- found
        return(e)
    }

    stopifnot(require(stringr, quietly = T))

    arglist <- list(...)

    if (!(is.character(rule) ||
        (length(rule) == 1)) ||
        (nchar(rule) == 0)) {
        stop("improper rule: rule must be a nonempty string")
    }
    type <- str_locate_all(rule, "[bidncsfla]")[[1]]
    count <- invert_match(type)
    type <- str_sub(rule, type[,1], type[,2])
    count <- str_sub(rule, count[,1], count[,2])
    if (count[1] != "") {
        stop("improper rule: rule must start with type [bidncsfl]")
    }
    count <- count[-1]
    if (length(count) != length(type)) {
        stop("improper rule: number of type and count specifiers must be equal")
    }
    for (i in seq_along(count)) {
        if ((!count[i] %in% c("?", "*", "+")) &&
            (regexpr("^[1-9][0-9]*$", count[i]) == -1)) {
                stop(paste("improper rule: unrecognized count",
                           count[i]))
        }
    }

    if (length(type) != length(arglist)) {
        stop("improper rule: number of type specifiers must be equal to number of variables")
    }

    for (i in seq_along(count)) {
        if ((!count[i] %in% c("?", "*", "+")) &&
            (regexpr("^[1-9][0-9]*$", count[i]) == -1)) {
                stop(paste("improper rule: unrecognized count",
                           count[i]))
        }
        if ((count[i] %in% c("?", "*")) &&
            is.null(arglist[[i]]) &&
            empty.null) {
            next
        }
        if (!switch(type[i],
               b = is.logical(arglist[[i]]),
               i = is.integer(arglist[[i]]),
               d = is.double(arglist[[i]]),
               n = is.numeric(arglist[[i]]),
               c = is.complex(arglist[[i]]),
               s = is.character(arglist[[i]]),
               f = is.function(arglist[[i]]),
               l = is.list(arglist[[i]]),
               a = TRUE)) {
            return(err("type", i, type[i], typeof(arglist[[i]])))
        }
        if (count[i] == "?") {
            if (length(arglist[[i]]) > 1) {
                return(err("count", i, count[i],
                           length(arglist[[i]])))
            }
        } else if (count[i] == "*") {
            # nothing to do - always met
        } else if (count[i] == "+") {
            if (length(arglist[[i]]) == 0) {
                return(err("count", i, count[i],
                           length(arglist[[i]])))
            }
        } else {
            if (length(arglist[[i]]) != as.integer(count[i])) {
                return(err("count", i, as.integer(count[i]),
                           length(arglist[[i]])))
            }
        }
    }
    return(TRUE)
}

Its first argument is rule a pattern that arguments should meet. Next the arguments are passed. For example in the following calls:

is.valid("b1i1d1n1c1s1f1l1a1",
          T, 1L, 1.0, 1, 1i, "1", sin, list(1), raw(1)) # TRUE

is.valid("b1i1d1n1c1s1f1l1a1",
          T, 1L, 1.0, 1, 1i, "1", sin, list(1), raw(2)) # FALSE

is.valid("b1i1d1n1c1s1f1l1a1",
          T, 1L, 1.0, 1, 1i, "1", sin, 1, raw(1))       # FALSE

the first is returns TRUE and second and third return FALSE (in the first there an improper length of last argument and in the second improper variable type). In case when FALSE is returned it contains attributes with diagnostic information on validation error type.

Additionally the function has an optional parameter empty.null. It influences the way it handles variables of length 0 (which are allowed when "?" or "*" argument length constraint is used). If it is set to TRUE then NULL as an argument is accepted as valid. You can see this in action in the following code:

is.valid("l?"NULL)                                    # TRUE
is.valid("l?", NULLempty.null = FALSE)                # FALSE

In summary is.valid function is a simple and compact way to check type and length of passed arguments and can be considered as an alternative to the approach proposed in the post I have mentioned above.

2 comments: