Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (102 page)

BOOK: Programming Python
11.27Mb size Format: txt, pdf, ePub
ads

[
42
]
There is even a common acronym for this today: LAMP, for the
Linux operating system, the Apache web server, the MySQL database
system, and the Python, Perl, and PHP scripting languages. It’s
possible, and even very common, to put together an entire
enterprise-level web server with open source tools. Python users
would probably also like to include systems like Zope, Django,
Webware, and CherryPy in this list, but the resulting acronym
might be a bit of a stretch.

Python Internet Development Options

Although many are
outside our scope here, there are a variety of ways that
Python programmers script the Web. Just as we did for GUIs, I want to
begin with a quick overview of some of the more popular tools in this
domain before we jump into the fundamentals.

Networking tools

As we’ll see in this chapter, Python comes with tools the
support basic networking, as well as implementation of custom types
of network servers. This includes
sockets
, but
also the
select
call for
asynchronous servers, as well as higher-order and pre-coded
socket server classes
. Standard library modules
socket
,
select
, and
socket
server
support all these
roles.

Client-side protocol tools

As we’ll see in the next chapter, Python’s
Internet arsenal also includes canned support for the
client side of most standard
Internet
protocols
—scripts
can easily make use of FTP, email, HTTP, Telnet, and
more. Especially when wedded to desktop GUIs of the sort we met in
the preceding part of this book, these tools open the door to
full-featured and highly responsive Web-aware applications.

Server-side CGI scripting

Perhaps the
simplest way to implement interactive website
behavior
,
CGI scripting
is an application
model for running scripts on servers to process form data, take
action based upon it, and produce reply pages. We’ll use it later in
this part of the book. It’s supported by Python’s standard library
directly, is the basis for much of what happens on the Web, and
suffices for simpler site development tasks. Raw CGI scripting
doesn’t by itself address issues such as cross-page state retention
and concurrent updates, but CGI scripts that use devices like
cookies and database systems often can.

Web frameworks and clouds

For more demanding Web work,
frameworks can automate many of the low-level details
and provide more structured and powerful techniques for dynamic site
implementation. Beyond basic CGI scripts, the Python world is flush
with third-party web frameworks such as
Django
—a high-level framework
that encourages rapid development and clean, pragmatic design and
includes a dynamic database access API and its own server-side
templating language
;
Google App Engine
—a “cloud
computing” framework that provides enterprise-level tools for use in
Python scripts and allows sites to leverage the capacity of Google’s
Web infrastructure; and
Turbo Gears
—an integrated
collection of tools including a JavaScript library, a template
system, CherryPy for web interaction, and SQLObject for accessing
databases using Python’s class model.

Also in the framework category are
Zope
—an open source web
application server and toolkit, written in and customizable with
Python, in which websites are implemented using a fundamentally
object-oriented model
;
Plone
—a Zope-based website
builder which provides a workflow model (called a content management
system) that allows content producers to add their content to a
site; and other popular systems for website construction, including
pylons
,
web2py
,
CherryPy
, and
Webware
.

Many of these frameworks are based upon the now widespread MVC
(model-view-controller) structure
, and most provide state retention solutions that wrap
database storage. Some make use of the
ORM (
object relational mapping
)
model we’ll meet in the next part of the book, which superimposes
Python’s classes onto relational database tables, and Zope stores
objects in your site in the
ZODB
object-oriented database we’ll study in the next part as
well.

Rich Internet Applications (revisited)

Discussed at the
start of
Chapter 7
,
newer and emerging “rich Internet application” (RIA) systems such as
Flex
,
Silverlight
,
JavaFX
, and
pyjamas
allow
user interfaces implemented in web browsers to be much more dynamic
and functional than HTML has traditionally allowed. These are
client-side solutions, based generally upon AJAX and JavaScript,
which provide widget sets that rival those of traditional “desktop”
GUIs and provide for asynchronous communication with web servers.
According to some observers, such interactivity is a major component
of the “Web 2.0” model.

Ultimately, the web browser is a “desktop” GUI application,
too, albeit one which is very widely available and which can be
generalized with RIA techniques to serve as a platform for rendering
other GUIs, using software layers that do not rely on a particular
GUI library. In effect, RIAs turn web browsers into extendable
GUIs.

At least that’s their goal today. Compared to traditional
GUIs, RIAs gain some portability and deployment simplicity, in
exchange for decreased performance and increased software stack
complexity. Moreover, much as in the GUI realm, there are already
competing RIA toolkits today which may add dependencies and impact
portability. Unless a pervasive frontrunner appears, using a RIA
application may require an install step, not unlike desktop
applications.

Stay tuned, though; like the Web at large, the RIA story is
still a work in progress. The emerging HTML5 standard, for instance,
while likely not to become prevalent for some years to come, may
obviate the need for RIA browser plug-ins eventually.

Web services: XML-RPC, SOAP

XML-RPC is a technology
that provides remote procedural calls to components
over networks. It routes requests over the HTTP
protocol and ships data back and forth packaged as XML text. To
clients, web servers appear to be simple functions; when function
calls are issued, passed data is encoded as XML and shipped to
remote servers using the Web’s HTTP transport mechanism. The net
effect is to simplify the interface to web servers in client-side
programs.

More broadly, XML-RPC fosters the notion of
web
services
—reusable software components that run on the
Web—and is supported by Python’s
xmlrpc.client
module, which handles the
client side of this protocol, and
xmlrcp.server
, which provides tools for
the server side.
SOAP is a similar but generally heavier web services
protocol, available to Python in the third-party
SOAPy
and
ZSI
packages,
among others.

CORBA ORBs

An earlier but comparable technology, CORBA
is an architecture for distributed programming, in
which components communicate across a network by routing calls
through an
Object Request Broker
(ORB). Python
support for CORBA is available in the third-party
OmniORB
package, as well as the (still
available though not recently maintained)
ILU
system.

Java and .NET: Jython and IronPython

We also met
Jython and IronPython briefly at the start of
Chapter 7
, in the context of GUIs. By
compiling Python script to Java bytecode,
Jython
also allows Python scripts to be used in
any context that Java programs can. This includes web-
oriented
roles, such as applets stored
on the server but run on the client when referenced within web
pages. The
IronPython
system also mentioned in
Chapter 7
similarly offers
Web-focused options, including access to the Silverlight RIA
framework and its Moonlight implementation in the Mono system for
Linux.

Screen scraping: XML and HTML parsing tools

Though not
technically tied to the Internet, XML text often
appears in such roles. Because of its other roles, though, we’ll
study Python’s basic XML parsing support, as well as third-party
extensions to it, in the next part of this book, when we explore
Python’s text processing toolkit. As we’ll see, Python’s
xml
package comes
with support for DOM, SAX, and ElementTree style XML parsing, and
the open source domain provides extensions for XPath and much more.
Python’s
html.parser
library module also provides a HTML-specific parser, with a model
not unlike that of XML’s SAX technique. Such tools can be used in
screen scraping
roles, to extract content of
web pages fetched with
urllib.request
tools.

Windows COM and DCOM

The
PyWin32
package
allows Python scripts to communicate via COM
on Windows to perform feats such as editing Word
documents and populating Excel spreadsheets (additional tools
support Excel document processing). Though not related to the
Internet itself (and being arguably upstaged by .NET in recent
years), the distributed extension to COM,
DCOM
,
offers additional options for distributing applications over
networks.

Other tools

Other tools serve more specific roles. Among this crowd
are
mod_python
—a system which
optimizes the execution of Python server-scripts in the
Apache web server;
Twisted
—an asynchronous,
event-driven, networking framework written in
Python
, with support for a large
number of network protocols and with precoded implementations of
common network servers;
HTMLgen
—a lightweight tool that
allows HTML code to be generated from a tree of Python objects that
describes a web page; and
Python Server Pages
(PSP)—
a server-side templating technology that embeds Python
code inside HTML, runs it with request context to render part of a
reply page, and is strongly reminiscent of PHP, ASP, and JSP.

As you might expect given the prominence of the Web, there are more
Internet tools for Python than we have space to discuss here. For more on
this front, see the PyPI website
at
http://python.org/
, or visit your
favorite web search engine (some of which are implemented using Python’s
Internet tools themselves).

Again, the goal of this book is to cover the fundamentals in an
in-depth way, so that you’ll have the context needed to use tools like
some of those above well, when you’re ready to graduate to more
comprehensive solutions. As we’ll see, the basic model of CGI scripting
we’ll meet here illustrates the mechanisms underlying all web development,
whether it’s implemented by bare-bones scripts, or advanced
frameworks.

Because we must walk before we can run well, though, let’s start at
the bottom here, and get a handle on what the Internet really is. The
Internet today rests upon a rich software stack; while tools can hide some
of its complexity, programming it skillfully still requires knowledge of
all its layers. As we’ll see, deploying Python on the Web, especially with
higher-order web frameworks like those listed above, is only possible
because we truly are “surfing on the shoulders of
giants.”

Plumbing the Internet

Unless you’ve been living in a cave for the last decade or two, you
are probably already familiar with the Internet, at least from a user’s
perspective. Functionally, we use it as a communication and information
medium, by exchanging email, browsing web pages, transferring files, and
so on. Technically, the Internet consists of many layers of abstraction
and devices—from the actual wires used to send bits across the world to
the web browser that grabs and renders those bits into text, graphics, and
audio on your computer.

In this book, we are primarily concerned with the programmer’s
interface to the Internet. This, too, consists of multiple layers:
sockets, which are programmable interfaces to the low-level connections
between machines, and standard protocols, which add structure to
discussions carried out over sockets. Let’s briefly look at each of these
layers in the abstract before jumping into programming details.

The Socket Layer

In simple terms,
sockets are a programmable interface to connections
between programs, possibly running on different computers of a network.
They allow data formatted as byte strings to be passed between processes
and machines. Sockets also form the basis and low-level “plumbing” of
the Internet itself: all of the familiar higher-level Net protocols,
like FTP, web pages, and email, ultimately occur over sockets. Sockets
are also sometimes called communications endpoints because they are the
portals through which programs send and receive bytes during a
conversation.

Although often used for network conversations, sockets may also be
used as a communication mechanism between programs running on the same
computer, taking the form of a general Inter-Process Communication (IPC)
mechanism. We saw this socket usage mode briefly in
Chapter 5
. Unlike some IPC devices, sockets are
bidirectional data streams: programs may both send and receive data
through them.

To programmers, sockets take the form of a handful of calls
available in a library. These socket calls know how to send bytes
between machines, using lower-level operations such as the TCP network
transmission control protocol. At the bottom, TCP knows how to transfer
bytes, but it doesn’t care what those bytes mean. For the purposes of
this text, we will generally ignore how bytes sent to sockets are
physically transferred. To understand sockets fully, though, we need to
know a bit about how computers are named.

Machine identifiers

Suppose for just a
moment that you wish to have a telephone conversation
with someone halfway across the world. In the real world, you would
probably need either that person’s telephone number or a directory
that you could use to look up the number from her name (e.g., a
telephone book). The same is true on the Internet: before a script can
have a conversation with another computer somewhere in cyberspace, it
must first know that other computer’s number or name.

Luckily, the Internet defines standard ways to name both a
remote machine and a service provided by that machine. Within a
script, the computer program to be contacted through a socket is
identified by supplying a pair of values—the machine name and a
specific port number on that machine:

Machine names

A machine name may
take the form of either a string of numbers
separated by dots, called an IP address (e.g.,
166.93.218.100
), or a more legible
form known as a domain name (e.g.,
starship.python.net
). Domain names are
automatically mapped into their dotted numeric address
equivalent when used, by something called a domain name server—a
program on the Net that serves the same purpose as your local
telephone directory assistance service. As a special case, the
machine name
localhost
, and
its equivalent IP address
127.0.0.1
, always mean the same local
machine; this allows us to refer to servers running locally on
the same computer as its clients.

Port numbers

A port
number is an agreed-upon numeric identifier for a
given conversation. Because computers on the Net support a
variety of services, port numbers are used to name a particular
conversation on a given machine. For two machines to talk over
the Net, both must associate sockets with the same machine name
and port number when initiating network connections. As we’ll
see, Internet protocols such as email and the Web have standard
reserved port numbers for their connections, so clients can
request a service regardless of the machine providing it. Port
number
80
, for example,
usually provides web pages on any web server machine.

The combination of a machine name and a port number uniquely
identifies every dialog on the Net. For instance, an ISP’s computer
may provide many kinds of services for customers—web pages, Telnet,
FTP transfers, email, and so on. Each service on the machine is
assigned a unique port number to which requests may be sent. To get
web pages from a web server, programs need to specify both the web
server’s Internet Protocol (IP) or domain name and the port number on
which the server listens for web page requests.

If this sounds a bit strange, it may help to think of it in
old-fashioned terms. To have a telephone conversation with someone
within a company, for example, you usually need to dial both the
company’s phone number and the extension of the person you want to
reach. If you don’t know the company’s number, you can probably find
it by looking up the company’s name in a phone book. It’s almost the
same on the Net—machine names identify a collection of services (like
a company), port numbers identify an individual service within a
particular machine (like an extension), and domain names are mapped to
IP numbers by domain name servers (like a phone book).

When programs use sockets to communicate in specialized ways
with another machine (or with other processes on the same machine),
they need to avoid using a port number reserved by a standard
protocol—numbers in the range of 0 to 1023—but we first need to
discuss protocols to understand
why.

The Protocol Layer

Although sockets form the
backbone of the Internet, much of the activity that
happens on the Net is programmed with protocols,
[
43
]
which are higher-level message models that run on top of
sockets. In short, the standard Internet protocols define a structured
way to talk over sockets. They generally standardize both message
formats and socket port numbers:

  • Message formats
    provide structure for the
    bytes exchanged over sockets during conversations.

  • Port numbers
    are reserved numeric
    identifiers for the underlying sockets over which messages are
    exchanged.

Raw sockets are still commonly used in many systems, but it is
perhaps more common (and generally easier) to communicate with one of
the standard higher-level Internet protocols. As we’ll see, Python
provides support for standard protocols, which automates most of the
socket and message formatting details.

Port number rules

Technically
speaking, socket port numbers can be any 16-bit integer
value between 0 and 65,535. However, to make it easier for programs to
locate the standard protocols, port numbers in the range of 0 to 1023
are reserved and preassigned to the standard higher-level protocols.
Table 12-1
lists the
ports reserved for many of the standard protocols; each gets one or
more preassigned numbers from the reserved range.

Table 12-1. Port numbers reserved for common protocols

Protocol

Common
function

Port
number

Python
module

HTTP

Web
pages

80

http.client
,
http.server

NNTP

Usenet
news

119

nntplib

FTP data
default

File
transfers

20

ftplib

FTP
control

File
transfers

21

ftplib

SMTP

Sending
email

25

smtplib

POP3

Fetching
email

110

poplib

IMAP4

Fetching
email

143

imaplib

Finger

Informational

79

n/a

SSH

Command
lines

22

n/a: third
party

Telnet

Command
lines

23

telnetlib

Clients and servers

To socket programmers,
the standard protocols mean that port numbers 0 to 1023
are off-limits to scripts, unless they really mean to use one of the
higher-level protocols. This is both by standard and by common sense.
A Telnet program, for instance, can start a dialog with any
Telnet-capable machine by connecting to its port, 23; without
preassigned port numbers, each server might install Telnet on a
different port. Similarly, websites listen for page requests from
browsers on port 80 by standard; if they did not, you might have to
know and type the HTTP port number of every site you visit while
surfing the Net.

By defining standard port numbers for services, the Net
naturally gives rise to
a
client/server
architecture. On
one side of a conversation, machines that support standard protocols
perpetually run a set of programs that listen for connection requests
on the reserved ports. On the other end of a dialog, other machines
contact those programs to use the services they export.

We usually call the perpetually running listener program a
server
and the connecting program a
client
. Let’s use the familiar web browsing model
as an example. As shown in
Table 12-1
, the HTTP
protocol used by the Web allows clients and servers to talk over
sockets on port 80:

Server

A machine that hosts websites usually runs a web server
program that constantly listens for incoming connection
requests, on a socket bound to port 80. Often, the server itself
does nothing but watch for requests on its port perpetually;
handling requests is delegated to spawned processes or
threads.

Clients

Programs that wish to talk to this server specify the
server machine’s name and port 80 to initiate a connection. For
web servers, typical clients are web browsers like Firefox,
Internet Explorer, or Chrome, but any script can open a
client-side connection on port 80 to fetch web pages from the
server. The server’s machine name can also be simply “localhost”
if it’s the same as the client’s.

In general, many clients may connect to a server over sockets,
whether it implements a standard protocol or something more specific
to a given application. And in some applications, the notion of client
and server is blurred—programs can also pass bytes between each other
more as peers than as master and subordinate. An agent in a
peer-to-peer file transfer system, for instance, may at various times
be both client and server for parts of files transferred.

For the purposes of this book, though, we usually call programs
that listen on sockets
servers
, and those that
connect
clients
. We also sometimes call the
machines that these programs run on
server
and
client
(e.g., a computer on which a web server
program runs may be called a
web server machine
,
too), but this has more to do with the physical than the
functional.

Protocol structures

Functionally,
protocols may accomplish a familiar task, like reading
email or posting a Usenet newsgroup message, but they ultimately
consist of message bytes sent over sockets. The structure of those
message bytes varies from protocol to protocol, is hidden by the
Python library, and is mostly beyond the scope of this book, but a few
general words may help demystify the protocol layer.

Some protocols may define the contents of messages sent over
sockets; others may specify the sequence of control messages exchanged
during conversations. By defining regular patterns of communication,
protocols make communication more robust. They can also minimize
deadlock conditions—machines waiting for messages that never
arrive.

For example, the FTP protocol prevents deadlock by conversing
over two sockets: one for control messages only and one to transfer
file data. An FTP server listens for control messages (e.g., “send me
a file”) on one port, and transfers file data over another. FTP
clients open socket connections to the server machine’s control port,
send requests, and send or receive file data over a socket connected
to a data port on the server machine. FTP also defines standard
message structures passed between client and server. The control
message used to request a file, for instance, must follow a standard
format.

BOOK: Programming Python
11.27Mb size Format: txt, pdf, ePub
ads

Other books

Under the Sun by Justin Kerr-Smiley
Daywards by Anthony Eaton
Love's Ransom by Kirkwood, Gwen
The Elementals by Lia Block, Francesca
Hard Choices by Ellson, Theresa
We Only Know So Much by Elizabeth Crane
The Redeemer by Jo Nesbo