Using Python decorators to be a lazy programmer: a case study

Decorators are considered one of the more advanced features of python and it will often be the last topic in a python class or introductory book. It will, unfortunately, also be one that trips up many beginning or even intermediate python programmers. Those who stick it out and work through it, though, will be handsomely rewarded for their hard work.

Known by those in-the-know, decorators are tools to make your python code beautiful, more concise, well-written, and elegant--but did you know you can use decorators to be a lazy bum of a programmer?!

I just recently whipped up a script that I'm using to help my expand and organize my photo library of my favorite pieces of art. This script that takes the URL of an art piece from artsy.net, downloads the image of the artwork and renames the downloaded file to follow a filename template (given as a CLI arg) based on the artist's name, the title of the piece, and the date of completion. For example, if we wanted to download an image of the Fountain, and have the file automatically named: Marcel Duchamp - Fountain - 1917.jpg, one can use the following command:

python artsy-dl.py "https://www.artsy.net/artwork/marcel-duchamp-fountain-1" 
                   "%a - %t - %d"

because %a is automatically replaced with the artist's name (that is extracted from the webpage), %t is automatically replaced with the title of the piece, and %d is the date of the piece.

In this post, I'll be demonstrating the use of decorators to the end doing the bare minimum and how I saved myself from having to write tedious (but important) error-checking code. But before that, you, dear reader, have to be clear on what a decorator is. The section that follows it perhaps the greatest intro to decorators ever.

(If you are already familiar with decorators, you can skip to the section called "The problem"--though you may want to at least skim this section.)

What are decorators?

Here's the skinny on Python decorators. Grokking decorators necessitates an intuitive understanding of three concepts:

  • People often speak of functions as being "first-class citizens" in Python. By this they mean that functions are values that can be assigned to variables, returned from (other) functions, and passed as an argument to (still other) functions.
  • When a function (we’ll call it outer) returns another function (we'll call this inner), the inner function "closes over" (remembers) variables defined in the enclosing scope (outer's scope). If the returned function is stored in a variable and called at a later time, it still remembers the variable(s) from the enclosing scope—even if it is called long after the outer function finishes running and it's variables otherwise lost. The inner function that is returned is known as a "closure".
  • A language that supports closures affords us the unique opportunity to easily add or modify the behavior of a function a by creating a function b that takes function a as an argument, and returns a function c which does something, and then calls function a. Function c can now be used in place of function a--it is essentially function a plus some extra functionality: it is a "decorated" version of function a.

In order to concretize these concepts, let's see an example of a decorator complete with an illustration of the motivation behind creating it and the cognitive steps taken toward it's finished state. Unlike some decorator tutorials, this lesson will not patronize you, dear reader, by designing a overly-simple decorator with no practical worth (it's always been my thought that this pedagogical strategy most often backfires). Instead, we'll create, you and I, a decorator of actual utility.

Suppose we wanted to time the execution of a function. Wanting something with a little more precision than a stopwatch, we decide to use the time module:

import time

def sleep_for_a_second():
    time.sleep(1)

start_time = time.time()
sleep_for_a_second()
end_time = time.time()

print("It took {0:.2f} seconds".format(end_time-start_time))
#> It took 1.00 seconds

This is ok, but if we want to time the execution of many different functions, this will result in a lot of repeated code. Being champions of the DRY principle, we decide it would be better to put this in a function:

def time_a_function(func):
    start_time = time.time()
    func()
    end_time = time.time()
    print("It took {0:.2f} seconds".format(end_time-start_time))

def sleep_for_a_second():
    time.sleep(1)

def sleep_for_two_seconds():
    time.sleep(2)

time_a_function(sleep_for_a_second)
#> It took 1.00 seconds
time_a_function(sleep_for_two_seconds)
#> It took 2.00 seconds

time_a_function is a function that takes the function we want to time as an argument.

We just timed two functions. Notice how we can now time an arbitrary number of functions with no extra code.

But there's an issue with this approach. We've hitherto been timing functions that take no arguments. How would we time a function that takes one or more arguments?

def sleep_for_n_seconds(n):
    time.sleep(n)

time_a_function(sleep_for_n_seconds(5))
#> TypeError: 'NoneType' object is not callable

Nope. Before, we were passing the variable that holds the function to time_a_function, but the above incantation evaluates sleep_for_n_seconds(5), passes it's None return value to time_a_function and because time_a_function can't call it (because it's not a function), we get an error. So how are we going to time sleep_for_n_seconds?

The solution is to make a function that takes a function that returns a function that takes an argument (n) and performs the function and times it and use the returned function in place of the original (whew!).

In other words:

def timer_decoration(func):
    def new_fn(n):
        start_time = time.time()
        func(n)
        end_time = time.time()
        print("It took {0:.2f} seconds".format(end_time-start_time))
    return new_fn

def sleep_for_n_seconds(n):
    time.sleep(n)

sleep_for_n_seconds = timer_decoration(sleep_for_n_seconds)
sleep_for_n_seconds(3)
#> It took 3.01 seconds

Study this code carefully. You've just wrote a decorator.

If you are confused it--as always with programming--helps if you type the code out yourself (no copy-and-pasting!) and play around with it.

Though it's not terrible unwieldy otherwise, python gives us a nice elegant way to tag a particular function with a decorator so that the function is automatically decorated (i.e. doesn't require us to replace the original function).

@timer_decoration
def sleep_for_n_seconds(n):
    time.sleep(n)

# now the reassignment of 'sleep_for_n_seconds' is unnecessary
sleep_for_n_seconds(3)
#> It took 3.00 seconds

But what happens if we try to decorate the original sleep_for_a_second function?

@timer_decoration
def sleep_for_a_second():
    time.sleep(1)

# sleep_for_a_second()
#> TypeError: new_fn() missing 1 required positional argument: 'n'

sleep_for_a_second is now expecting 1 argument :(. We can generalize our decorator to handle functions that take an arbitrary number of arguments with *args and **kargs...

def timer_decoration(func):
    def new_fn(*args, **kargs):
        start_time = time.time()
        func(*args, **kargs)
        end_time = time.time()
        print("It took {0:.2f} seconds".format(end_time-start_time))
    return new_fn

@timer_decoration
def sleep_for_n_seconds(n):
    time.sleep(n)

@timer_decoration
def sleep_for_a_second():
    time.sleep(1)

@timer_decoration
def sleep_for_k_seconds(k=1):
    time.sleep(k)

sleep_for_n_seconds(3)
#> It took 3.00 seconds
sleep_for_a_second()
#> It took 1.00 seconds
sleep_for_k_seconds(k=4)
#> It took 4.00 seconds

Ace!

Finally, let's rewrite our decorator to support returning the return value of the decorated function.

def timer_decoration(func):
    def new_fn(*args, **kargs):
        start_time = time.time()
        ret_val = func(*args, **kargs)
        end_time = time.time()
        print("It took {0:.2f} seconds".format(end_time-start_time))
        return ret_val
    return new_fn

@timer_decoration
def sleep_for_a_second_p():
    time.sleep(1)
    return True

print(sleep_for_a_second_p())
#> It took 1.01 seconds
#> True

Note that our decorator is now generalized enough to be used with any function... no matter what it's return type is... no matter what arguments it takes...

It doesn't matter what the function is, the function's behavior remains the same except now it is "decorated" with functionality that times it.

This is just one example of a decorator with obvious generalized utility. You can also use decorators to perform memoization, static-typing-like type enforcement of function signatures, automatically retry functions that failed, and simulate non-strict evaluation.

The problem

To review, I wrote a script that takes the URL of an artwork on artsy.net, downloads the image, and then names the file in accordance with a user-supplied format string that uses info about the artwork. With the help of the requests, lxml, and wget modules, a script to do this can be coded relatively quickly. The problem, though--which is common for scripts that talk to the web that don't do error-checking--is that the script is brittle. Without error-checking, any malformed URL, network interruption, invalid output path, or weird edge case like an artwork without a title, will result in an unsightly error message and lengthy stack trace. Besides being aesthetically objectionable, if anyone else is using your script, you will look like an incompetent software engineer. So you have to bite the bullet and error-check.

The problem with error-checking is

  • If all possible errors are checked for separately and individually and handled appropriately (this is good practice), it will result in code often many times longer than the original code. Only a small fraction of the code will be the actual interesting logic of the program--most of it will now be mindless conditionals.
  • It's difficult for someone without training (me) to anticipate every possible error.
  • It takes a lot of work and I'm lazy

So my usual M.O. is to wrap each component in a try/except block (with no specificity in the exception), print an error message, and terminate execution...

try:
    <brittle code>
except:
    sys.exit("<brittle code> broke")

Except I don't even do that. Instead of wrapping each component in a try/except with its own error message, I just wind up try/excepting main once. This cuts down typing and carpal tunnel is a real thing...

def main():
    <literally everything>

try:
    main()
except:
    sys.exit("whoopsie daisy")

Decorators to the rescue

Okkkaaaayyyyy... if we absolutely must do some modicum of error checking around each component (so the user has some kind of clue as to why it the script's usage failed) we can write a decorator to do this for us. The following is an excerpt of the script as of commit d6be4956543:

...
# the decorator
def cop_out(f):
    def inner(*args, **kargs):
        try:
            return f(*args, **kargs)
        except:
            sys.exit("\nThe function <{}> failed\n".format(f.__name__))
    return inner

@cop_out
def get_command_line_arguments():
    return sys.argv[1], sys.argv[2]

@cop_out
def download_webpage(url):
    r = requests.get(url)
    if r.status_code != 200:
        raise Exception
    return r

@cop_out
def parse_webpage(requests_object):
    return lxml.html.fromstring(requests_object.text)
....

Note how we use f.__name__ to get the name of the function that was decorated. This allows us to add the support for specialized (at the function level, at least) error messages for free!

Now, if the user calls the script with too few arguments, the program will print The function <get_command_line_arguments> failed. If you give it a real URL but not to an artsy.net artwork, it'll say The function <extract_artist_name_from_webpage> failed. If you give it a made-up URL, it'll say The function <download_webpage> failed, etc...

Sure, beyond the function level, you don't know why it failed, but anything is better than nothing and your users shouldn't be so bossy and entitled.

But one more thing... if you looked at the code, you'll notice that my function names are descriptive... maybe too long and descriptive. The use of prose-like descriptive function names (certainly by the standards of Haskell programmers) was no accident. Although it may seem like an uncharacteristically diligent and conscientious decision on my part, it was actually to facilitate further laziness. Consider the following tweak to the decorator:

def cop_out(f):
    def inner(*args, **kargs):
        try:
            return f(*args, **kargs)
        except:
            message = f.__name__.replace("_", " ")
            sys.exit("\nFailed to {}\n".format(message))
    return inner

Consider how this generates error messages that appear to individualized...

$ python artsy-dl.py "https://www.artsy.net/artwork/jean-michel-basquiat-untitled-33211237"
Failed to get command line arguments

$ python3.4 artsy-dl.py "https://www.artsy.net/artwork/jean-michel-BAHHSKIIIAAHT-untitled-33211237"
                        "%a/%t - %d"
Failed to download webpage

$ python2.7 artsy-dl.py "https://www.artsy.net/artist/jean-michel-basquiat"
                         "%a/%t - %d"
Failed to extract artist name from webpage

So there you have it! Decorators can be used for legitimate, elegant solutions but can also be employed--virtually for free--to give the illusion that you are a caring software engineer and meticulous with your error checking.

PS

If you're a potential employer or client, I'm just kidding--I'm very diligent about error checking. This piece is satire. I promise.

share this: Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Sending text messages at random times using python

Given my interest for applying statistics and analytics to most (if not all of the) quantifiable aspects of my life, when I learned about self-tracking, and the associated 'Quantified Self' movement, it should come as no surprise to anyone that knows me that I wanted to get started right away.
And...
Given my interest in making life harder than it needs to be, it makes sense that I would eschew existing self-tracking tools and build my own. A neat side-effect of this obstinance is getting to learn new things.

The basic idea is at random times during the day, I fill out a survey that I designed for myself including questions such as: "How happy are you right now?", "How much energy would you say that you have right now?", and "Where are you right now?".

The most reliable and fastest way to get in touch with me is to send a text message. So, sending myself text messages at random times during the day is the best way to prompt me to fill out this self-tracking survey.

To make it easier (and, therefore, more likely that I'll fill it out) the content of the text message should be a link to the survey on the web. And in order to add flexibility to when I have to fill out the survey form but also preserve the randomness of the sampling, the timestamp of the time the text message was sent should be included as a url parameter so that it can be stored in the database along with the answers to the survey.

The service that sends these text messages runs on a Debian GNU/Linux EC2 instance that also hosts the form I fill out and the database that the answers are dumped to.

Before we get to the code, I should explain the modules that we will need for this task, and my rationale for choosing them.

logging
Trying to debug a scheduled task or workflow is a living hell without proper and verbose logging. Since this must be run in the background (and not tied to a particular terminal emulator) simple print statements will not do. The more elegant, scalable, and extensible solution is to use Python's excellent 'logging' module.

smtplib
While there are a few different ways to send text messages (SMS) using Python, the solution I settled on is to use the 'smtplib' standard library module to send an email to an SMS gateway. This gateway will convert the email into a text message sent to my phone. smtplib is needed to send the email message.

apscheduler
Although cron (or equivalently [?] Windows Scheduling Service) should be the tool of choice when scheduling commands to be run at specific times that never change, the fact that the text messages have to be sent at different times everyday requires another solution. Probably the most elegant and cross-platform solution is to use the advanced python scheduling library, apscheduler. The Python standard library comes with a similar module, sched, but apscheduler is more advanced with its scheduling capability and its ability to persistently store tasks in a database that survives process restart. (It supports storage in SQLite, PostgreSQL, MongoDB, Redis, MySQL, Oracle, MS-SQL, Firebird, and Sybase). But, unlike its standard library counterpart, it needs to be pip installed.

We will divide this task up into two python scripts, one that gets run once a day, computes n random times, schedules to send a text message those times, and then sends the message (we will call this send_daily_texts.py), and one script that runs once that calls send_daily_texts at midnight everyday (we will call this run_everyday.py).

send_daily_texts.py

#!/usr/bin/python -tt

import random
import sys
import logging
import smtplib
import email.utils
from email.mime.text import MIMEText
from datetime import datetime, timedelta, date
from apscheduler.schedulers.blocking import BlockingScheduler

# create logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
handler = logging.FileHandler('send_daily_texts.log')
handler.setLevel(logging.DEBUG)
logger.addHandler(handler)
logger.info("[{}] - send_daily_texts was run".format(datetime.now()))

# the number of times to schedule and send text messages
# are provided as a command line argument
n = int(sys.argv[1])

logger.info("[{}] - going to choose {} random times".format(datetime.now(), n))

# we need to parse today's state to properly
# schedule the text message sending
dadate = datetime.now()
year = dadate.year
month = dadate.month
day = dadate.day

# the lower bound is 8 o' clock
lower_bound = datetime(year, month, day, 8, 0, 0)
logger.info("[{}] - the lower bound is {}".format(datetime.now(), lower_bound))

# the upper bound is 11 o' clock PM
upper_bound = datetime(year, month, day, 23, 0, 0)
logger.info("[{}] - the upper bound is {}".format(datetime.now(), upper_bound))

sched = BlockingScheduler()
logger.info("[{}] - Created blocking scheduler".format(datetime.now()))

wherefrom = 'YOUEMAILACCOUNTYOCREATE AT gmail DOT com'
whereto = 'YOURPHONENUMBER AT YOURSMSGATEWAY DOT com'
gmail_pw = 'YOURGMAILPASSWORD'

def encode_timestamp(timestamp):
    return str(timestamp).replace(" ", "+").replace(":", "%3A")

def make_message(timestamp, wherefrom, whereto):
    slug = "http://THELINKURL/?timestamp={}".format(encode_timestamp(timestamp))
    msg = MIMEText(slug)
    msg['To'] = email.utils.formataddr(('Recipient', whereto))
    msg['From'] = email.utils.formataddr(('Author', wherefrom))
    msg['Subject'] = 'Time for the survey!'
    return msg

def send_text(should_exit=False):
    logger.info('[{}] - trigger triggered, going to send text'.format(datetime.now()))
    logger.info('[{}] - attempting to connect to gmail'.format(datetime.now()))
    server = smtplib.SMTP("smtp.gmail.com", 587)
    server.starttls()
    server.login(wherefrom, gmail_pw)
    logger.info('[{}] - successfully connected to gmail'.format(datetime.now()))
    timestamp = datetime.now()
    msg = make_message(timestamp, wherefrom, whereto)
    logger.info('[{}] - going to send message {} to {}'.format(datetime.now(),
                                                               damsg.replace('\n', '<br>'),
                                                               whereto))
    ret = server.sendmail(wherefrom, [whereto], damsg)
    server.quit()
    if should_exit:
        logger.info('[{}] - finished... going to exit'.format(datetime.now()))
        sched.shutdown(wait=False)

def random_time(start, end):
    sec_diff = int((end-start).total_seconds())
    secs_to_add = random.randint(0, sec_diff)
    return start + timedelta(seconds=secs_to_add)

def get_n_random_times(n, start, end):
    times = []
    for i in range(0, n):
        times.append(random_time(start, end))
    times.sort()
    return times

times = get_n_random_times(n, lower_bound, upper_bound)
logger.info("[{}] - Received {} times to schedule".format(datetime.now(),
                                                         len(times)))

for ind, atime in enumerate(times):
    if ind == (n-1):
        sched.add_job(send_text, 'date', run_date=atime,
                      args={"should_exit": True})
        logger.info("[{}] - added last task at {}".format(datetime.now(),
                                                         atime))
    else:
        sched.add_job(send_text, 'date', run_date=atime)
        logger.info("[{}] - added task at {}".format(datetime.now(),
                                                     atime))

sched.start()
logger.info("[{}] - everything is done".format(datetime.now()))

Before I describe "run_everyday.py" there are a few things I should note about the snippet above.

When I originally wrote this script, the text messages wouldn’t send even though the logger indicated that it had. I assumed this was because gmail rejected it because it didn't look enough like an email message. In order to correct this, I needed to use the email.mime.text module to add the standard email headers to the message to be send.

Since I am only interested in experience sampling my waking life, I didn't want to fill out the survey during hours that I am normally sleep. I had to make sure the I set 8 o' clock and 23 (11pm) o' clock as my lower and upper bound, respectively.

Third, if you decide to cannibalize this code, make sure you change the values for 'wherefrom', 'whereto', and 'gmail_pw'. The format the SMS gateway you should use depends upon your mobile carrier. My particular SMS gateway is my 10 digit phone number @vtext.com. Your’s will likely be different–consult this list.

run_everyday.py

#!/usr/bin/python -tt

import sys
import logging
from datetime import datetime
from subprocess import Popen, PIPE
from apscheduler.schedulers.blocking import BlockingScheduler

def run_daily_surveys(thelogger):
    thelogger.info("[{}] - Trigger triggered".format(datetime.now()))
    thelogger.info("[{}] - Going to run daily script".format(datetime.now()))
    p = Popen('./send_daily_texts.py 3', shell=True, stdout=PIPE, stderr=PIPE)
    out, err = p.communicate()
    if p.returncode:
        thelogger.error("[{}] - Failed to run daily script".format(datetime.now()))
        sys.exit("Failed to run daily script")
    thelogger.info("[{}] - Ran daily script".format(datetime.now()))
    if p.returncode:
        sys.exit("Command failed to run")

def main():
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)
    handler = logging.FileHandler('run_everyday.log')
    handler.setLevel(logging.DEBUG)
    logger.addHandler(handler)
    logger.info("[{}] - run_everyday.py was run".format(datetime.now()))
    
    sched = BlockingScheduler()
    logger.info("[{}] - blocking scheduler was created".format(datetime.now()))
    sched.add_job(run_daily_surveys, 'interval', days=1, args=[logger])
    logger.info("[{}] - everyday task added, going to start the scheduler".format(datetime.now()))
    sched.start()
    return 0

if __name__ == '__main__':
    STATUS = main()
    sys.exit(STATUS)

I've been running these tasks for about a week now, and its working great!

My next couple of blog posts will be about server-side code and architecture to support my self-tracking project.

share this: Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Damn the torpedoes, full speed ahead: making the switch to Python 3

Python 3 has been out since 2008 (and realistically usable since 2009).
In spite of this four year availability period, Python 3 use has yet to see widespread adoption, particularly among groups in the scientific community. In the company of data scientists/statisticians, when someone says they've written their own Python code to perform some task, it's usually assumed that they are talking about Python 2; it is Python 3 that requires the version number qualification.

There are members of the community (I used to include myself in this category) that are really happy with Python(2) and hope that if they ignore Python 3 it will just go away.

It won't though. The fact of the matter is that "Python 2.x is legacy, Python 3.x is the present and future of the language."

So how do we get people to adopt Python 3? In my opinion, there are three key strategies:

  • go softer on Python 3 denialists, perhaps with a Python 2.8 (Guido said this will not happen)
  • go harder on Python 3 denialists by discontinuing 2.7 maintenance
  • serve as an example to programmers (especially new ones) by switching your default python interpreter to python3.

As you, dear reader, can probably tell from my wording, I personally favor strategy 3.

Part of that solution involves vendors shipping Python 3 by default. We are making some progress in this regard (Arch GNU/Linux now has python sym-linked to python3, and Fedora and Ubuntu have stated that they will follow suit), but we still have a lot of work to do. A huge step forward would be if Apple ships macs with Python 3. Current macs use 2.7 which wasn't, finally, released until 2010. This means that they could have used Python 3 instead. That would have really shook things up because a lot of my friends and colleagues in my field just use the Apple-supplied Python interpreter for analytics (vis-a-vis SciPy Superpack). The extension of 2.7 support until 2020 will unfortunately afford Apple the opportunity to be lackadaisical in its porting to Python 3 because they might only do so when upstream maintenance ends (and maybe not even then).

The other part of strategy 3 involves personally serving as an example by using Python 3 as your default interpreter.

But I can't rationally will that more people do this if I am unwilling to do this myself. While it's true that my big open-source Python project meant for widespread public consumption was very carefully made Python 3 compatible, I noticed that the code on this blog is often Python 3 incompatible. This is primarily because the python code I quickly whip up is run through my default Python 2 interpreter. My obstinance to switch to Python 3 by default is helping to contribute to Python 3's slow adoption and implicitly serving notice that it's ok to still use Python 2.

But I no longer want to be party to this transition quagmire (and the ASCII-normative cultural hegemony). Because of this, I recently took the plunge and switched my default Python stack to Python 3. Damn the torpedoes, full speed ahead!

I was, perhaps, in a better position to do this than some because all of my most used third-party Python packages have already been ported; this includes the SciPy ecosystem (NumPy, SciPy, pandas, scikit-learn), IPython, lxml, networks, BeautifulSoup, and requests. It was really easy for me to ditch the Apple-supplied Python interpreter in favor for MacPorts' build of Python 3.4. I was even able to install most of my favorite third-party packages using MacPorts (the ones that weren't available I pip-installed as "user" to not muck up the MacPorts installation prefix). The only hard part about the switch was that almost all of my system python code stopped working; everything from my own system utilities to the battery-life indicator in my tmux panes.

While fixing all of this wayward code, I took notice of the incompatibilities that caused me the most trouble:

  • changes to exception handling
  • changing xrange() to range()
  • changing raw_input() to input()
  • changing my print statements into functions
  • explicitly requesting relative module imports
  • wrapping map() function calls in "list()" because map() now returns an iterator (I actually just re-wrote the code to use list comprehensions.)

But by far the incompatibility that caused the most heart-ache were I/O changes, and this is mostly due to the way Python 3 handles unicode.

There is far too much to say about Unicode in Python 3 to provide a detailed explanation in this post (if you're interested, I've put some great learning resources at the end of this post) but, essentially, Python no longer allows you to willy-nilly mix 8-bit string data and text objects. If you find yourself asking "Why am I having such trouble with my porting?" than the answer may be that you were playing fast-and-loose with mixing (what probably always should have been) incompatible types: unicode strings and 8-bit string data.

For this reason Python 2 was easier to deal with for programmers/scientists who dealt with largely numbers or ASCII-only text, but this won't cut it anymore. It's unreasonable and culturally chauvinistic to require that non-English speaking programmers misspell (or transliterate) their names and variables just to contribute to a non-internationalized codebase.

Before you learn anymore hacks to get unicode working in Python 2.x, consider switching to Python 3. Basically, the new I/O/string paradigm in Python 3 comes down to

  • understanding unicode and utf-8
  • decoding input as early as you can
  • encoding output as late as you can
  • only working with unicode strings within the program (no bytes!)



If you're reading this blog, you're probably a data scientist/statistician (or my parents) and you can almost assuredly make the switch to Python 3 since virtually all of our most prized packages have been ported (even nltk has a branch that works with Python 3). Just to make sure, you can use this tool that tracks the Python 3-readiness of some popular packages.

Final notes:
This post was in no way meant to shame programmers/scientists who still choose to use Python 2.x. I completely understand and sympathize with those that have very large Python 2 codebases to maintain or are locked into a particular Python version because of their company policy or their clients' needs.

If you are interested using Python 3, though, but want to approach it cautiously, consider using the 2to3 conversion tool to see how the code needs to change. Another great strategy is to use the various __future__ imports to ease the transition. Something like:

from __future__ import division, absolute_import, print_function, unicode_literals

At the least, you call python using the "-3" flag to see possible problems and incompatibilities. You can do this by changing your shebang line to something like

#!/usr/bin/env python -3


or create the following shell alias

alias python="python -3"

Resources for learning Python 3 I/O and unicode:

Other notes from Python 3 apologists:

share this: Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail