Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (29 page)

BOOK: Programming Python

10.35Mb size Format: txt, pdf, ePub

ads

The queue Module

You can
synchronize your threads’ access to shared resources with
locks, but you often don’t have to. As mentioned, realistically scaled
threaded programs are often structured as a set of producer and consumer
threads, which communicate by placing data on, and taking it off of, a
shared queue. As long as the queue synchronizes access to itself, this
automatically synchronizes the threads’ interactions.

The Python
queuemodule
implements this storage device. It provides a standard queue data
structure—a first-in first-out (fifo) list of Python objects, in which
items are added on one end and removed from the other. Like normal
lists, the queues provided by this module may contain any type of Python
object, including both simple types (strings, lists, dictionaries, and
so on) and more exotic types (class instances, arbitrary callables like
functions and bound methods, and more).

Unlike normal lists, though, the queue object is automatically
controlled with thread lock acquire and release operations, such that
only one thread can modify the queue at any given point in time. Because
of this, programs that use a queue for their cross-thread communication
will be thread-safe and can usually avoid dealing with locks of their
own for data passed between threads.

Like the other tools in Python’s threading arsenal, queues are
surprisingly simple to use. The script in
Example 5-14
, for instance, spawns two
consumer threads that watch for data to appear on the shared queue and
four producer threads that place data on the queue periodically after a
sleep interval (each of their sleep durations differs to simulate a
real, long-running task). In other words, this program runs 7 threads
(including the main one), 6 of which access the shared queue in
parallel.

Example 5-14. PP4E\System\Threads\queuetest.py

"producer and consumer threads communicating with a shared queue"
numconsumers = 2                  # how many consumers to start
numproducers = 4                  # how many producers to start
nummessages  = 4                  # messages per producer to put
import _thread as thread, queue, time
safeprint = thread.allocate_lock()    # else prints may overlap
dataQueue = queue.Queue()             # shared global, infinite size
def producer(idnum):
for msgnum in range(nummessages):
time.sleep(idnum)
dataQueue.put('[producer id=%d, count=%d]' % (idnum, msgnum))
def consumer(idnum):
while True:
time.sleep(0.1)
try:
data = dataQueue.get(block=False)
except queue.Empty:
pass
else:
with safeprint:
print('consumer', idnum, 'got =>', data)
if __name__ == '__main__':
for i in range(numconsumers):
thread.start_new_thread(consumer, (i,))
for i in range(numproducers):
thread.start_new_thread(producer, (i,))
time.sleep(((numproducers-1) * nummessages) + 1)
print('Main thread exit.')

Before I show you this script’s output, I want to highlight a few
points in its code.

Arguments versus globals

Notice how the queue is
assigned to a global variable; because of that, it is
shared by all of the spawned threads (all of them run in the same
process and in the same global scope). Since these threads change an
object instead of a variable name, it would work just as well to pass
the queue object in to the threaded functions as an argument—the queue
is a shared object in memory, regardless of how it is referenced (see
queuetest2.py
in the examples tree for
a full version that does this):

dataQueue = queue.Queue()             # shared object, infinite size
def producer(idnum, dataqueue):
for msgnum in range(nummessages):
time.sleep(idnum)
dataqueue.put('[producer id=%d, count=%d]' % (idnum, msgnum))
def consumer(idnum, dataqueue): ...
if __name__ == '__main__':
for i in range(numproducers):
thread.start_new_thread(producer, (i, dataQueue))
for i in range(numproducers):
thread.start_new_thread(producer, (i, dataQueue))

Program exit with child threads

Also notice how this
script exits when the main thread does, even though
consumer threads are still running in their infinite loops. This works
fine on Windows (and most other platforms)—with the basic
_threadmodule, the program ends silently
when the main thread does. This is why we’ve had to sleep in some
examples to give threads time to do their work, but is also why we do
not need to be concerned about exiting while consumer threads are
still running here.

In the alternative
threadingmodule, though, the program will not exit if any spawned threads are
running, unless they are set to be
daemon
threads. Specifically, the entire program exits when only daemon
threads are left. Threads inherit a default initial daemonic value
from the thread that creates them. The initial thread of a Python
program is considered not daemonic, though alien threads created
outside this module’s control are considered daemonic (including some
threads created in C code). To override inherited defaults, a thread
object’s
daemonflag can be set
manually. In other words, nondaemon threads prevent program exit, and
programs by default do not exit until all
threading-managed threads finish.

This is either a feature or nonfeature, depending on your
program—it allows spawned worker threads to finish their tasks in the
absence of
joincalls or sleeps,
but it can prevent programs like the one in
Example 5-14
from shutting down when
they wish. To make this example work with
threading, use the following alternative
code (see
queuetest3.py
in the examples tree for
a complete version of this, as well as
thread-count-threading.py
, also in the
tree, for a case where this refusal to exit can come in handy):

import threading, queue, time
def producer(idnum, dataqueue): ...
def consumer(idnum, dataqueue): ...
if __name__ == '__main__':
for i in range(numconsumers):
thread = threading.Thread(target=consumer, args=(i, dataQueue))
thread.daemon = True  # else cannot exit!
thread.start()
waitfor = []
for i in range(numproducers):
thread = threading.Thread(target=producer, args=(i, dataQueue))
waitfor.append(thread)
thread.start()
for thread in waitfor: thread.join()    # or time.sleep() long enough here
print('Main thread exit.')

We’ll revisit the daemons and exits issue in
Chapter 10
while studying GUIs; as we’ll see,
it’s no different in that context, except that the main thread is
usually the GUI itself.

Running the script

Now, as coded in
Example 5-14
, the
following is the output of this example when run on my
Windows machine. Notice that even though the queue automatically
coordinates the communication of data between the threads, this script
still must use a lock to manually synchronize access to the standard
output stream; queues synchronize data passing, but some programs may
still need to use locks for other purposes. As in prior examples, if
the
safeprintlock is not used, the
printed lines from one consumer may be intermixed with those of
another. It is not impossible that a consumer may be paused in the
middle of a print operation:

C:\...\PP4E\System\Threads>
queuetest.py
consumer 1 got => [producer id=0, count=0]
consumer 0 got => [producer id=0, count=1]
consumer 1 got => [producer id=0, count=2]
consumer 0 got => [producer id=0, count=3]
consumer 1 got => [producer id=1, count=0]
consumer 1 got => [producer id=2, count=0]
consumer 0 got => [producer id=1, count=1]
consumer 1 got => [producer id=3, count=0]
consumer 0 got => [producer id=1, count=2]
consumer 1 got => [producer id=2, count=1]
consumer 1 got => [producer id=1, count=3]
consumer 1 got => [producer id=3, count=1]
consumer 0 got => [producer id=2, count=2]
consumer 1 got => [producer id=2, count=3]
consumer 1 got => [producer id=3, count=2]
consumer 1 got => [producer id=3, count=3]
Main thread exit.

Try adjusting the parameters at the top of this script to
experiment with different scenarios. A single consumer, for instance,
would simulate a GUI’s main thread. Here is the output of a
single-consumer run—producers still add to the queue in fairly random
fashion, because threads run in parallel with each other and with the
consumer:

C:\...\PP4E\System\Threads>
queuetest.py
consumer 0 got => [producer id=0, count=0]
consumer 0 got => [producer id=0, count=1]
consumer 0 got => [producer id=0, count=2]
consumer 0 got => [producer id=0, count=3]
consumer 0 got => [producer id=1, count=0]
consumer 0 got => [producer id=2, count=0]
consumer 0 got => [producer id=1, count=1]
consumer 0 got => [producer id=3, count=0]
consumer 0 got => [producer id=1, count=2]
consumer 0 got => [producer id=2, count=1]
consumer 0 got => [producer id=1, count=3]
consumer 0 got => [producer id=3, count=1]
consumer 0 got => [producer id=2, count=2]
consumer 0 got => [producer id=2, count=3]
consumer 0 got => [producer id=3, count=2]
consumer 0 got => [producer id=3, count=3]
Main thread exit.

In addition to the basics used in our script, queues may be
fixed or infinite in size, and get and put calls may or may not block;
see the Python library manual for more details on queue interface
options. Since we just simulated a typical GUI structure, though,
let’s explore the notion a bit
further.

Preview: GUIs and Threads

We will return
to threads and queues and see additional thread and queue
examples when we study GUIs later in this book. The PyMailGUI example in
Chapter 14
, for instance, will make
extensive use of thread and queue tools introduced here and developed
further in
Chapter 10
, and
Chapter 9
will discuss threading in the
context of the tkinter GUI toolkit once we’ve had a chance to study it.
Although we can’t get into code at this point, threads are usually an
integral part of most nontrivial GUIs. In fact, the activity model of
many GUIs is a combination of threads, a queue, and a timer-based
loop.

Here’s why. In the context of a GUI, any operation that can block
or take a long time to complete must be spawned off to run in parallel
so that the GUI (the main thread) remains active and continues
responding to its users. Although such tasks can be run as processes,
the efficiency and shared-state model of threads make them ideal for
this role. Moreover, since most GUI toolkits do not allow multiple
threads to update the GUI in parallel, updates are best restricted to
the main thread.

Because only the main thread should generally update the display,
GUI programs typically take the form of a main GUI thread and one or
more long-running producer threads—one for each long-running task being
performed. To synchronize their points of interface, all of the threads
share data on a global queue: non-GUI threads post results, and the GUI
thread consumes them.

More specifically:

The
main thread
handles all GUI updates
and runs a timer-based loop that wakes up periodically to check for
new data on the queue to be displayed on-screen. In Python’s tkinter
toolkit, for instance, the widget
after(msecs,func,*args)method can be used to schedule
queue-check events. Because such events are dispatched by the GUI’s
event processor, all GUI updates occur only in this main thread (and
often must, due to the lack of thread safety in GUI
toolkits).
The
child threads
don’t do anything
GUI-related. They just produce data and put it on the queue to be
picked up by the main thread. Alternatively, child threads can place
a callback function on the queue, to be picked up and run by the
main thread. It’s not generally sufficient to simply pass in a GUI
update callback function from the main thread to the child thread
and run it from there; the function in shared memory will still be
executed in the child thread, and potentially in parallel with other
threads.

Since threads are much more responsive than a timer event loop in
the GUI, this scheme both avoids blocking the GUI (producer threads run
in parallel with the GUI), and avoids missing incoming events (producer
threads run independent of the GUI event loop and as fast as they can).
The main GUI thread will display the queued results as quickly as it
can, in the context of a slower GUI event loop.

Also keep in mind that regardless of the thread safety of a GUI
toolkit, threaded GUI programs must still adhere to the principles of
threaded programs in general—access to shared resources may still need
to be synchronized if it falls outside the scope of the
producer/consumer shared queue model. If spawned threads might also
update another shared state that is used by the main GUI thread, thread
locks may also be required to avoid operation overlap. For instance,
spawned threads that download and cache email probably cannot overlap
with others that use or update the same cache. That is, queues may not
be enough; unless you can restrict threads’ work to queuing their
results, threaded GUIs still must address concurrent updates.

We’ll see how the threaded GUI model can be realized in code later
in this book. For more on this subject, see especially the discussion of
threaded tkinter GUIs in
Chapter 9
,
the thread queue
tools implemented in
Chapter 10
, and the
PyMailGUI example in
Chapter 14
.

Later in this chapter, we’ll also meet the
multiprocessingmodule, whose process and
queue support offers new options for implementing this GUI model using
processes instead of threads; as such, they work around the limitations
of the thread GIL, but may incur extra performance overheads that can
vary per platform, and may not be directly usable at all in threading
contexts (the direct shared and mutable object state of threads is not
supported, though messaging is). For now, let’s cover a few final thread
fine
points.

Thread Timers versus GUI Timers

Interestingly, the
threadingmodule
exports a general timer function, which, like the
tkinter widget
aftermethod, can be
used to run another function after a timer has expired:

Timer(N.M, somefunc).start()   # after N.M seconds run somefunc

Timer
objects have a
start()method to set the timer as well as a
cancel()method to cancel the
scheduled event, and they implement the wait state in a spawned
thread. For example, the following prints a message after 5.5
seconds:

>>>
import sys
>>>
from threading import Timer
>>>
t = Timer(5.5, lambda: print('Spam!'))
# spawned thread
>>>
t.start()
>>>
Spam!

This may be useful in a variety of contexts, but it doesn’t
quite apply to GUIs: because the time-delayed function call is run in
a spawned thread, not in the main GUI thread, it should not generally
perform GUI updates. Because the tkinter
aftermethod is run from the main thread’s
event processing loop instead, it runs in the main GUI thread and can
freely update the GUI.

As a preview, for instance, the following displays a pop-up
message window in 5.5 seconds in the main thread of a tkinter GUI (you
might also have to run
win.mainloop()in some interfaces):

>>>
from tkinter import Tk
>>>
from tkinter.messagebox import showinfo
>>>
win = Tk()
>>>
win.after(5500, lambda: showinfo('Popup', 'Spam!'))

The last call here schedules the function to be run once in the
main GUI thread, but it does not pause the caller during the wait, and
so does not block the GUI. It’s equivalent to this simpler
form:

>>>
win.after(5500, showinfo, 'Popup', 'Spam')

Stay tuned for much more on tkinter in the next part of this
book, and watch for the full story on its
aftertimer events in
Chapter 9
and the roles of threads in
GUIs in
Chapter 10
.

BOOK: Programming Python

10.35Mb size Format: txt, pdf, ePub

Read Book Download Book

ads

Other books

Underworld by Meg Cabot

Writing the Novel by Lawrence Block, Block

Scattered Leaves by V. C. Andrews

WIREMAN by Mosiman, Billie Sue

Sarah's Playmates by Virginia Wade

Slow Release (Ebony and Ivory Book 1) by Steele, Suzanne, Weathers, Stormy Dawn

The Boy Who Lost His Face by Louis Sachar

Take the Fourth by Jeffrey Walton

K is for Knifeball by Jory John

Strong Mystery: Murder, Mystery and Magic Books 1-3 (Steampunk Magica) by Raven Bond