Damn the torpedoes, full speed ahead: making the switch to Python 3

Python 3 has been out since 2008 (and realistically usable since 2009).
In spite of this four year availability period, Python 3 use has yet to see widespread adoption, particularly among groups in the scientific community. In the company of data scientists/statisticians, when someone says they've written their own Python code to perform some task, it's usually assumed that they are talking about Python 2; it is Python 3 that requires the version number qualification.

There are members of the community (I used to include myself in this category) that are really happy with Python(2) and hope that if they ignore Python 3 it will just go away.

It won't though. The fact of the matter is that "Python 2.x is legacy, Python 3.x is the present and future of the language."

So how do we get people to adopt Python 3? In my opinion, there are three key strategies:

  • go softer on Python 3 denialists, perhaps with a Python 2.8 (Guido said this will not happen)
  • go harder on Python 3 denialists by discontinuing 2.7 maintenance
  • serve as an example to programmers (especially new ones) by switching your default python interpreter to python3.



As you, dear reader, can probably tell from my wording, I personally favor strategy 3.

Part of that solution involves vendors shipping Python 3 by default. We are making some progress in this regard (Arch GNU/Linux now has python sym-linked to python3, and Fedora and Ubuntu have stated that they will follow suit), but we still have a lot of work to do. A huge step forward would be if Apple ships macs with Python 3. Current macs use 2.7 which wasn't, finally, released until 2010. This means that they could have used Python 3 instead. That would have really shook things up because a lot of my friends and colleagues in my field just use the Apple-supplied Python interpreter for analytics (vis-a-vis SciPy Superpack). The extension of 2.7 support until 2020 will unfortunately afford Apple the opportunity to be lackadaisical in its porting to Python 3 because they might only do so when upstream maintenance ends (and maybe not even then).

The other part of strategy 3 involves personally serving as an example by using Python 3 as your default interpreter.

But I can't rationally will that more people do this if I am unwilling to do this myself. While it's true that my big open-source Python project meant for widespread public consumption was very carefully made Python 3 compatible, I noticed that the code on this blog is often Python 3 incompatible. This is primarily because the python code I quickly whip up is run through my default Python 2 interpreter. My obstinance to switch to Python 3 by default is helping to contribute to Python 3's slow adoption and implicitly serving notice that it's ok to still use Python 2.

But I no longer want to be party to this transition quagmire (and the ASCII-normative cultural hegemony). Because of this, I recently took the plunge and switched my default Python stack to Python 3. Damn the torpedoes, full speed ahead!

I was, perhaps, in a better position to do this than some because all of my most used third-party Python packages have already been ported; this includes the SciPy ecosystem (NumPy, SciPy, pandas, scikit-learn), IPython, lxml, networks, BeautifulSoup, and requests. It was really easy for me to ditch the Apple-supplied Python interpreter in favor for MacPorts' build of Python 3.4. I was even able to install most of my favorite third-party packages using MacPorts (the ones that weren't available I pip-installed as "user" to not muck up the MacPorts installation prefix). The only hard part about the switch was that almost all of my system python code stopped working; everything from my own system utilities to the battery-life indicator in my tmux panes.

While fixing all of this wayward code, I took notice of the incompatibilities that caused me the most trouble:

  • changes to exception handling
  • changing xrange() to range()
  • changing raw_input() to input()
  • changing my print statements into functions
  • explicitly requesting relative module imports
  • wrapping map() function calls in "list()" because map() now returns an iterator (I actually just re-wrote the code to use list comprehensions.)



But by far the incompatibility that caused the most heart-ache were I/O changes, and this is mostly due to the way Python 3 handles unicode.

There is far too much to say about Unicode in Python 3 to provide a detailed explanation in this post (if you're interested, I've put some great learning resources at the end of this post) but, essentially, Python no longer allows you to willy-nilly mix 8-bit string data and text objects. If you find yourself asking "Why am I having such trouble with my porting?" than the answer may be that you were playing fast-and-loose with mixing (what probably always should have been) incompatible types: unicode strings and 8-bit string data.

For this reason Python 2 was easier to deal with for programmers/scientists who dealt with largely numbers or ASCII-only text, but this won't cut it anymore. It's unreasonable and culturally chauvinistic to require that non-English speaking programmers misspell (or transliterate) their names and variables just to contribute to a non-internationalized codebase.

Before you learn anymore hacks to get unicode working in Python 2.x, consider switching to Python 3. Basically, the new I/O/string paradigm in Python 3 comes down to

  • understanding unicode and utf-8
  • decoding input as early as you can
  • encoding output as late as you can
  • only working with unicode strings within the program (no bytes!)



If you're reading this blog, you're probably a data scientist/statistician (or my parents) and you can almost assuredly make the switch to Python 3 since virtually all of our most prized packages have been ported (even nltk has a branch that works with Python 3). Just to make sure, you can use this tool that tracks the Python 3-readiness of some popular packages.

Final notes:
This post was in no way meant to shame programmers/scientists who still choose to use Python 2.x. I completely understand and sympathize with those that have very large Python 2 codebases to maintain or are locked into a particular Python version because of their company policy or their clients' needs.

If you are interested using Python 3, though, but want to approach it cautiously, consider using the 2to3 conversion tool to see how the code needs to change. Another great strategy is to use the various __future__ imports to ease the transition. Something like:

from __future__ import division, absolute_import, print_function, unicode_literals

At the least, you call python using the "-3" flag to see possible problems and incompatibilities. You can do this by changing your shebang line to something like

#!/usr/bin/env python -3


or create the following shell alias

alias python="python -3"

Resources for learning Python 3 I/O and unicode:


Other notes from Python 3 apologists:


share this: facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

2 Responses

  1. Matthew Blott June 3, 2014 / 5:14 am

    I posted this on another thread but thought I'd share my frustration here as well :-)

    ...

    I'd like to add my tuppence worth to this debate. I'm a seasoned C# developer and wanted an open source language in my arsenal as I find myself doing more open source and Linux related stuff these days. I've looked at Node and Ruby but decided on Python as it seems to have the biggest community and is good for other things aside from web programming. I've been using it for a few months but I had no idea Python 3 adoption was stagnant. I've been using Python 3 because like most people learning a new language I thought it made sense to go with the latest version. With most languages and software you have older supported repos but you don't have everyone using a code base that's over five years old! It really doesn't take long before you start hitting problems with Python 3 and as a newbie I've found it incredibly frustrating doing small fixes and hacks just to get stuff running - one thing that appealed to me about Python was I thought I would find it more productive than C#! Having invested time learning Python 3 I'm reluctant to start using Python 2 - it seems like a retrograde step. And if I'm going to learn something new why would I want to learn something that's years out of date?

    So, that's been my experience and I am now taking another look at Ruby. It's a shame as Python has some great libraries and frameworks, especially for the web which is my main area. But languages need to evolve or they die and I don't want to invest time in something that doesn't seem like it's moving forward.

    • [email protected] June 3, 2014 / 10:00 am

      I feel your frustration, especially if you're new to the python ecosystem. But I have a few thoughts.

      First, I wouldn't say that Python 3 adoption is stagnant, it's just not happening as fast as I (and many others) would like. Second, if you're running into Python 3 problems for web development, check out Django; Flask; or CherryPy, all of which are Python 3 compatible. I, personally, am all about Web.py.

      So I think Python *is* moving forward, and eventually the Python 3 denialists will get hip to this. So it's my hope that you stick with Python :)

      If you decide to go another route, though, in lieu of Ruby, you might want to check out Go; I think it's going to make a big splash soonish. Although, if you mostly do web stuff, there's probably nothing better than Rails.

      Good luck!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">