Recipe 1.10. Filtering a String for a Set of Characters
Credit: Jürgen Hermann, Nick Perkins, Peter
Cogolo
Problem
Given a set of characters to keep, you need to build a filtering
function that, applied to any string s,
returns a copy of s that contains only
characters in the set.
Solution
The TRanslate method of string objects is fast and
handy for all tasks of this ilk. However, to call
translate effectively to solve this
recipe's task, we must do some advance preparation.
The first argument to TRanslate is a translation
table: in this recipe, we do not want to do any translation, so we
must prepare a first argument that specifies "no
translation". The second argument to
TRanslate specifies which characters we want to
delete: since the task here says that
we're given, instead, a set of characters to
keep (i.e., to not delete),
we must prepare a second argument that gives the set
complementdeleting all characters we must not
keep. A closure is the best way to do this advance preparation just
once, obtaining a fast filtering function tailored to our exact
needs:
import string
# Make a reusable string of all characters, which does double duty
# as a translation table specifying "no translation whatsoever"
allchars = string.maketrans('', '')
def makefilter(keep):
""" Return a function that takes a string and returns a partial copy
of that string consisting of only the characters in 'keep'.
Note that `keep' must be a plain string.
"""
# Make a string of all characters that are not in 'keep': the "set
# complement" of keep, meaning the string of characters we must delete
delchars = allchars.translate(allchars, keep)
# Make and return the desired filtering function (as a closure)
def thefilter(s):
return s.translate(allchars, delchars)
return thefilter
if _ _name_ _ == '_ _main_ _':
just_vowels = makefilter('aeiouy')
print just_vowels('four score and seven years ago')
# emits: ouoeaeeyeaao
print just_vowels('tiger, tiger burning bright')
# emits: ieieuii
Discussion
The key to understanding this recipe
lies in the definitions of the maketrans function
in the string module of the Python Standard
Library and in the translate method of string
objects. TRanslate returns a copy of the string
you call it on, replacing each character in it with the corresponding
character in the translation table passed in as the first argument
and deleting the characters specified in the second argument.
maketrans is a utility function to create
translation tables. (A translation table is a string
t of exactly 256 characters: when you pass
t as the first argument of a
translate method, each character
c of the string on which you call the
method is translated in the resulting string into the character
t[ord(c)].)
In this recipe, efficiency is maximized by splitting the filtering
task into preparation and execution phases. The string of all
characters is clearly reusable, so we build it once and for all as a
global variable when this module is imported. That way, we ensure
that each filtering function uses the same string-of-all-characters
object, not wasting any memory. The string of characters to delete,
which we need to pass as the second argument to the
translate method, depends on the set of characters
to keep, because it must be built as the "set
complement" of the latter: we must tell
translate to delete every character that we do not
want to keep. So, we build the delete-these-characters string in the
makefilter factory function. This building is done
quite rapidly by using the translate method to
delete the "characters to keep"
from the string of all characters. The translate
method is very fast, as are the construction and execution of these
useful little resulting functions. The test code that executes when
this recipe runs as a main script shows how to build a filtering
function by calling makefilter, bind a name to the
filtering function (by simply assigning the result of calling
makefilter to a name), then call the filtering
function on some strings and print the results.
Incidentally, calling a filtering function with
allchars as the argument puts the set of characters
being kept into a canonic string form, alphabetically sorted and
without duplicates. You can use this idea to code a very simple
function to return the canonic form of any set of characters
presented as an arbitrary string:
def canonicform(s):
""" Given a string s, return s's characters as a canonic-form string:
alphabetized and without duplicates. """
return makefilter(s)(allchars)
The Solution uses a def statement to make the
nested function (closure) it returns, because def
is the most normal, general, and clear way to make functions. If you
prefer, you could use lambda instead, changing the
def and return statements in
function makefilter into just one return
lambda statement |