The performance gains from switching R's linear algebra libraries

What is often forgotten in the so-called data analysis "language wars” is that, across most of these languages, many common computations are performed using outsourced dynamically linked math libraries. For example, R; Python's Numpy; Julia; Matlab; and Mathematica all make heavy use of the BLAS linear algebra API. As a result, R can't be properly faulted (or praised) for how slowly (or rapidly) it performs Cholesky decomposition since (a) the R core team wasn't responsible for the algorithm's implementation, and (b) neither are other languages' contributors for theirs.

For code that deals predominately with numerical computing and constructs from linear algebra, language battles become more a petty and vacuous squabble over subjective preferences in syntax rather than substantive discourse that encourages true innovation and improvement. That being said, R is the best, and if you disagree you should feel bad.

Two posts ago, I asserted that, for speed-up purposes, recompilation of R was usually unnecessary and that other lower-hanging fruit should be taken before resorting to recompilations. We've already seen that for certain problems, parallelizing your code is a great and relatively easy-to-implement speed up option. Another great option that's available is the ability to swap out R's linear algebra libraries for faster ones. Since these libraries are linked to at run-time—as opposed to being included statically at compile-time—employing the use of these alternative libraries do not require recompilation.

The topic of swapping BLAS implementations has already been covered well in these blog posts by Nathan VanHoudnos and Zachary Mayer, as well as in this paper by Dirk Eddelbuettel, but I thought I’d throw in my thoughts and results, too.

For my comparisons, I pitted OpenBLAS and Apple’s Accelerate Framework's implementation of BLAS against each other and the BLAS that comes with R by default. I wanted to try others too, but I either had an extraordinarily difficult time compiling them, or was unwilling to shell out money for a propriety library (here's looking at you, Intel). Of particular interest to me was trying out the ATLAS library.

I originally thought that testing ATLAS would be redundant because I was given to understand that Apple Accelerate’s "vecLib" was a hand-tuned version of ATLAS for Apple processors. After looking further into it, I discovered that this is no longer the case. Apple asserts in line 632 of cblas.h that "The Apple BLAS is no longer based on ATLAS". Unfortunately, nothing I tried would get ATLAS to compile. C'est la vie.

The benchmarking script I used to test these implementations can be furnished from
It records the time to completion of a large series of various computations including matrix creation, inversion, and multiplication. By default, it performs each test 3 times and takes the trimmed geometric mean of the time to completions.

These are the results of the total time elapsed (for all tests) among the three libraries.

Comparison of BLAS implementation performance

As you can see, both Accelerate and OpenBLAS blew R's default BLAS implementation out of the water, with Accelerate marginally outperforming OpenBLAS. The last line of the output from the UNIX "time" command gives us a clue as to why this might be the case:

R --slave 56.10 user 1.65s system 122% cpu 47.162 total

Judging by above-100-percent CPU usage of both Accelerate and OpenBLAS (and how hot my laptop got), I'd wager that the primary source of the improvement is Accelerate's and OpenBLAS's ability to use multiprocessing. If you have more than one core available, this is something that you might want to look into.

share this: Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

The state of package management on Mac OS X

It's that time again; I suspect that Mavericks will be released in the next few weeks, so I get the once-every-year-(or-so) chance to experiment and modify the hell out of my OS X installation because I'll just do a fresh install soon anyway. This time around I'm experimenting with package managers.

I've actually tried really hard to avoid ever having to use them. I started using Slackware in high school, and after some brief experimentation (in college) with Ubuntu, I took up OS X as my main OS. But, since building from source code is somewhat of a nightmare on a Mac--at least compared to what I was used to--I started to look into package management solutions.

The terrain was difficult to navigate. It seemed like people had some really strong opinions on which one was the best and which ones were on their way out. Since I didn't know who to believe, I just stuck to manual building. But, since I'm going to get a tabula rasa in a few weeks, I thought I'd take this opportunity to document this terrain exploring and present my finding in the most impartial manner that I'm capable of.

Before I start, I want to make a few things clear. (1) There is some disagreement on what actually constitutes a package manager. Here, I'm referring broadly to any centralized software installation framework that tracks or resolves dependencies, whether it builds from source or not. (2) I haven't had the time to become an expert on all of the managers I audited, so keep that in mind. (3) Not only are all of these package managers open source, but many of them have robust configuration options, so I'll be talking mostly about default behavior from the perspective of a new user.

If my old editions of O'Reilly books discussing Mac software are any indication, MacPorts and Fink were the two best options available. Then Homebrew came on the scene and a lot of people seem to be raving about it. I started off with the intention of only trying out these three but in the course of my research, I learned about two others that I wanted to give a chance.

To see a table summary of my findings, you can just scroll down to the end of this post.

Rudix is a binary-only package manager that attempts a "hassle-free" way of getting Unix programs on a Mac. It doesn't have many packages available yet, but it has no trouble at all installing and uninstalling the ones that it does offer. For example, their 'Go' installation was the most painless installation of a language that I've ever experienced. My complaints are that (a) the binaries go directly to /usr/bin, so they are not sandboxed, and (b) the man files for these tools were not installed with the binaries.

MacPorts was one of the most recommended package management solutions that I came across in my research. It also probably attracted the most flak. It was built with the likeness of FreeBSD's Ports system, so it's a source building manager. What I liked about MacPorts was the fact that the installation was painless (it updated my PATH for me!), the compiled binaries were sandboxed in /opt/local, and the wealth of packages available was hard not to love.

An interesting thing about MacPorts is that it eschews Apple-supplied libraries and links sources against its own. A benefit of this is that it can ensure a consistent experience across OS X versions and whatever whimsical decisions Apple may choose to make in the future. The drawback to this approach is that building what appears, prima facie, to be a small package may require an extraordinarily large amount of huge programs and libraries to be built as dependencies.

Fink is modeled after Debian's dpkg and apt-get. Having used Debian-based distros in the past, I was excited to see what Fink had to offer. Like apt-get, Fink can install binaries or build from source. What wasn't like apt-get was that a completely different command was used to build from source ("fink") than to install the binaries. This was somewhat confusing. Furthermore, there is no binary installer for 10.6 to 10.8, so installation was a bit harrowing. Once it was installed, though, and I got used to the separate commands and its differences to "apt-get", I was pleased that my PATH was automatically updated and that the installed binaries were appropriately sandboxed.

Like I mentioned above, a lot of people are really excited about Homebrew. It is being developed with the intention to correct (what it perceived to be) MacPorts' shortcomings. From what I can tell, it tries really hard to work with OS X's existing framework/libraries. For this reason, Homebrew is probably a good choice for someone who is using it to install the occasional tool on a single user system.

A neat thing about Homebrew is that it is written very simply in ruby. Its "recipes" to install packages are easy-to-read ruby scripts. They are also very easy to modify and the community encourages upstream development.

Something not-so-neat about Homebrew is that it is publicly antagonistic towards MacPorts. This is probably something that only I care about, though.

Again, I started with the intention of only auditing Fink, Homebrew and MacPorts. When I learned about pkgsrc, I thought that it was too obscure to be a serious contender and I was considering not looking into it further. I am so glad that, for completeness' sake, I decided to try it out because I virtually have only good things to say about it.

pkgsrc started as NetBSD's package management solution. Given NetBSD's dedication to portability, it is perhaps not a surprise that their package manager would attempt to follow suit. It has now been adapted for use on over a dozen different operating systems. Among these are AIX, Solaris, HP-UX, GNU/Linux, Windows (via Cygwin and Interix) and, of course, OS X. It is the default manager on DragonflyBSD and was even the default manager on a now-discontinued GNU/Linux distro, Bluewall Linux. It is similar to (and, indeed, was forked from) FreeBSD's ports system.

I don't think many Mac power-users know that this is an option for them which is a shame because it turned out to be my favorite. After following some fairly simple steps, a mature and sophisticated package manager with over 8,000 packages is at your disposal.

Probably the best thing about pkgsrc from the perspective of Mac users is a tool called pkgin. It's an apt-like tool for installing binaries from pkgsrc. Installing strange Unix tools on OS X *can not* be easier.

The only caveat I should mention is that I haven't tested installing Python with it because I'm still too far away from Mavericks to risk botching my environment that badly. I suspect that it would cause issues because pkgsrc, being a NetBSD project, can't be as aware of OS X framework idiosyncracies as a Mac-specific package manager can.

I'd like to write more on this topic, but this post is getting unwieldy. I plan to talk more about pkgsrc and OS X in another post but, for this one, I'll conclude with the "too-long-didn't-read" version of my journey through package-manager-land.

categoryRudixMacPortsFinkHomebrewpkgsrc / pkgin and
Twitter@rudix4mac (updates often)@macports (last tweet in July)@finkmac (hasn't had update since 2010)@machomebrew (very active)@pkgsrc (last tweet in September)
Year project started2005200220012009Support for Darwin added in 2001
Number of packages488 (but `rudix available | wc -l` says 351)17,680 (but `port list | wc -l` says 17,686)7,951. `apt-cache search . | wc -l` says 209 stable binary .deps)2,498. `brew search | wc -l` says 2,591. This is not counting various extra "taps"8,884 binaries for OS X (according to `pkgin available | wc -l`)
Source/binary/both?Binary onlyTraditionally only sourceOption for bothSource, but also binaries through "bottles"Both. Traditional pkgsrc will do both but using only pkgin will grab the binaries
Language written inPythonTclPerl (front-end)RubyC
Gui optionsNot really... but there's an internet package browsing optionCurrently threeTwo: fink commander, and phynchronicityNope, but online package browser at Braumeister.orgOnline package browser at but none others that I can find
Default prefixDirectly to /usr/local/opt/local/sw/usr/local/Cellar. Programs symlink to /usr/local/bin/usr/pkg
Power-PC supportNot anymoreYes because it is built from sourceYesNot traditionally, but there are forks available that might provide this functionalityNot unless you build from source
Lastest GCC availableNot available4. binary available but pkgsrc has 4.8
Python stuffNot availablePy27 and 33 and a lot of great packagesPy23 and 33 and a lot of great packagesPy27 and 33. I couldn't find any packages but the python installs pip and easy_installPy27 and 33 and a lot of great packages. (see warning above)
Installation of package managerVery easy and fastVery easy and fastNightmarish (no binary installer for 10.6 - 10.8)Easy as pieVery easy and fast with these instructions
Uninstallation of package managerEasy and painlessHell-ishVery easy and fastRelatively easy if you follow this gist: sure, probably just a rm -rf-ing the /usr/pkg and /usr/pkgsrc directories
Installation of packagesExtremely easySlow, since it builds from sourceThe source builds are understandably slow, but the binaries are (obviously) quickSource compilation is obviously slow. I've had some linking issues sometimes.Trivially easy
Uninstallation of packagesEasy and painlessEasyEasy and fastVery easyTrivially easy
Community supportNot very much is requiredGreatNot so greatVery very goodA few websites have some great documentation but some other information it is hard to find OS X-specific info.
DevelopmentGit. Primarily lead by one person. 5 contributors.Subversion. Very happening. Many many developers.Git. 14 GitHub contributors. Commits are infrequentGit. Most vibrant. Over 3,000 contributors. "Recipes" for compilation are easily modified and you are encouraged to submit pull requests. This project is very easy to contribute to.Pkgsrc is CVS. Pkgin is Git. pkgsrc is well backed by the NetBSD Foundation
share this: Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail