Interactive visualization of non-linear logistic regression decision boundaries with Shiny

(skip to the shiny app)

Model building is very often an iterative process that involves multiple steps of choosing an algorithm and hyperparameters, evaluating that model / cross validation, and optimizing the hyperparameters.

I find a great aid in this process, for classification tasks, is not only to keep track of the accuracy across models, but also to have some visual aid to note which data points are systematically misclassified and why. Is there a lot of noise? Does the model require a non-linear classifier?

My desire for visualizing the results are stymied by (a) high-dimensional data (for which we have no choice but to reduce dimensionality) and (b) the cost of task switching between tweaking the hyperparameters and re-running the plot. Unless I'm using two monitors, I can't even see the plots change in real-time.

Well... Enter Shiny.

Shiny is an R package from RStudio and other open source contributors that makes it incredibly easy to create interactive web applications from R analyses. With Shiny, I can add dropdown menus and sliders to choose algorithms or features and control hyperparameters and visualize the changes to the model in real-time right from a web browser (all in pure R and no Javascript or CSS).

Further, I can deploy this web app easily (and for free) so I can share it with my friends and colleagues.

For a first real foray into Shiny, I chose to visualize the decision boundaries of logistic regression classifiers. I chose logistic regression because I'm taking Andrew Ng's excellent Machine Learning course on Coursera, and reimplementing the algorithms in R (from GNU Octave / Matlab) and it was our last homework assignment.

The implementation of logistic regression and the visualization of the decision boundaries proved to be difficult for two reasons:

(a) The residuals of logistic regression aren't normally distributed and there exists no closed form solution that returns the coefficients that maximize the likelihood function. This means that we have to provide R's 'optim' higher-order function with a custom-written function to be minimized or maximized (we will be minimizing the cost function) and a function that returns the gradient (the differentiation of that function at that location). And...

(b) Although a linear combination of the predictor variables (a first degree polynomial hypothesis) has a linear decision boundary, adding ("faking") higher-degree polynomial features results in non-linear decision boundaries; awesome for classification, un-awesome for visualization.

crummy linear fit to circular data

crummy linear fit to circular data


great quadratic non-linear fit to circular data

great quadratic non-linear fit to circular data

The two datasets we will be using were generated using make_circles and make_moons from scikit-learn's 'datasets' module. These will both require non-linear hypothesis to achieve any kind of better-than-chance classification.

These are the supporting functions to add polynomial features, compute the hypothesis function, compute the cost function, and return the gradient:

add.poly.features <- function(x.mat, degree=2){
  new.mat <- matrix(1, nrow=nrow(x.mat))
  for (i in 1:degree){
    for (j in 0:i){
      new.mat <- cbind(new.mat, (x.mat[,1]^(i-j) * (x.mat[,2]^j)))
    }
  }
  return(new.mat)
}

hypothesis.function <- function(param.vec, x.mat){
  zed <- x.mat %*% matrix(param.vec)
  return(1 / (1 + exp(-zed)))
}

get.gradient <- function(param.vec, x.mat, y.vec, lambda=0){
  m <- nrow(x.mat)
  modtheta <- param.vec
  modtheta[1] <- 0
  the.hyp <- hypothesis.function(param.vec, x.mat)
  gradient <- (t(x.mat) %*% (the.hyp - y.vec) + lambda*modtheta) / m
  return(gradient)
}

cost.function <- function(param.vec, x.mat, y.vec, lambda=0){
  m <- nrow(x.mat)
  the.hyp <- hypothesis.function(param.vec, x.mat)
  cost <- (((t(-y.vec) %*% log(the.hyp)) - (t(1-y.vec) %*% log(1-the.hyp))) / m) +
    ((lambda / (2*m)) * sum(param.vec[2:length(param.vec)] ^ 2))
  
  return(cost)
}

Finally, this is the code that finds the optimal coefficients and plots the resulting hypothesis (this is wrapped in the reactive "renderPlot" Shiny function so it can be updated every time the Shiny controls are changed)

if(input$pattern=="moon")
  da.dataset <- moon
else
  da.dataset <- circle

da.lambda <- input$lambda
da.degree <- input$degree

design.mat <- add.poly.features(da.dataset[,c(1,2)], degree=da.degree)

result <- optim(par=rep(0, ncol(design.mat)),
                cost.function, 
                get.gradient,
                x.mat=design.mat,
                y.vec=as.matrix(da.dataset[,3]),
                lambda=da.lambda,
                method=input$opt)

predictions <- hypothesis.function(result$par, design.mat)
accuracy <- paste0(round(sum(round(predictions) ==
                                   da.dataset[,3]) / 3, 2), "%")

thex1 <- da.dataset[,1]
thex2 <- da.dataset[,2]
somex <- seq(min(thex1), max(thex1), by=.05)
somex2 <- seq(min(thex2), max(thex2), length.out=length(somex))

z <- matrix(0, nrow=length(somex), ncol=length(somex))

for (i in 1:length(somex)){
  for (j in 1:length(somex)){
    keep <- add.poly.features(t(matrix(c(somex[i], somex2[j]))), da.degree)
    z[i, j] <- as.matrix(keep) %*% result$par
  }
}

plot(da.dataset$X2 ~ da.dataset$X1,  pch=20, 
     col=c("red","green3")[da.dataset$Y+1],
     xlab="X1", ylab="X2")
title(paste("Degree:", da.degree,
            " -  Lambda:", da.lambda,
            "     -      Accuracy:", accuracy))

contour(somex, t(somex2), z, nlevels=1, add=TRUE, drawlabels=FALSE)

Notice that the classification dataset, the degree of the hypothesized polynomial, the regularization hyperparameter (lambda), and the optimization method are parameterized. We will control these options from the Shiny app.

Put all together, code looks like it does in this GitHub repo and yields this Shiny app.

Shiny app screeshot

Shiny app screeshot

Is it just me, or is what you can do with Shiny amazing?

In future iterations of my Shiny visualization of classification endeavors, I plan to:

  • add support for more classification algorithms and their respective relevant hyper parameters
  • use file upload to plot custom datasets
  • and use dimensionality reduction automatically for datasets with more than two 'true' features

Until then, shine on you crazy diamond.

share this: facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

4 Comments

  1. Maxim K. July 25, 2014 1:15 pm Reply

    I felt immediately that this is not a coincidence that your post appears right after week 4 of dr. Ng's course has been completed :)

    I am just curious as to why use optimization algorithms, as in R you have built-in functions to estimate logistic regression (e.g. glm()).

    • [email protected] July 25, 2014 1:19 pm Reply

      Oh, haha, the reason is I wanted to rewrite the Octave homework from the class in R and it would have been cheating to use glm :)
      I feel like I understand a lot more about optimization algorithms now, even more than from just implementing the homework in Octave

    • [email protected] July 25, 2014 1:26 pm Reply

      Also, isn't the class great?!

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">