Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (155 page)

BOOK: Programming Python
6.65Mb size Format: txt, pdf, ePub
ads
Passing Parameters in Hardcoded URLs

Earlier, we
passed parameters to CGI scripts by listing them at the
end of a URL typed into the browser’s address field—in the query string
parameters part of the URL, after the
?
. But there’s nothing sacred about the
browser’s address field. In particular, nothing is stopping us from
using the same URL syntax in
hyperlinks that we hardcode or generate in web page
definitions.

For example, the web page from
Example 15-14
defines three hyperlinks
(the text between the

and

tags), which trigger our
original
tutor5.py
script again (
Example 15-12
), but with three
different precoded sets of parameters.

Example 15-14. PP4E\Internet\Web\tutor5c.html

CGI 101

Common input devices: URL parameters


This demo invokes the tutor5.py server-side script again,
but hardcodes input data to the end of the script's URL,
within a simple hyperlink (instead of packaging up a form's
inputs). Click your browser's "show page source" button
to view the links associated with each list item below.

This is really more about CGI than Python, but notice that
Python's cgi module handles both this form of input (which is
also produced by GET form actions), as well as POST-ed forms;
they look the same to the Python CGI script. In other words,
cgi module users are independent of the method used to submit
data.

Also notice that URLs with appended input values like this
can be generated as part of the page output by another CGI script,
to direct a next user click to the right place and context; together
with type 'hidden' input fields, they provide one way to
save state between clicks.






This static HTML file defines three hyperlinks—the first two are
minimal and the third is fully specified, but all work similarly (again,
the target script doesn’t care). When we visit this file’s URL, we see
the page shown in
Figure 15-17
. It’s mostly just a
page for launching canned calls to the CGI script. (I’ve reduced the
text font size here to fit in this book: run this live if you have
trouble reading it here.)

Figure 15-17. Hyperlinks page created by tutor5c.html

Clicking on this page’s second link creates the response page in
Figure 15-18
. This link invokes
the CGI script, with the
name
parameter set to “Tom” and the
language
parameter set to “Python,” simply
because those parameters and values are hardcoded in the URL listed in
the HTML for the second hyperlink. As such, hyperlinks with parameters
like this are sometimes known as
stateful
links—they automatically direct the next script’s operation. The net
effect is exactly as if we had manually typed the line shown at the top
of the browser in
Figure 15-18
.

Figure 15-18. Response page created by tutor5.py (3)

Notice that many fields are missing here; the
tutor5.py
script is smart enough to detect and
handle missing fields and generate an
unknown
message in the reply page. It’s also
worth pointing out that we’re reusing the Python CGI script again. The
script itself is completely independent of both the user interface
format of the submission page, as well as the technique used to invoke
it—from a submitted form or a hardcoded URL with query parameters. By
separating such user interface details from processing logic, CGI
scripts become reusable software components, at least within the context
of the CGI environment.

The query parameters in the URLs embedded in
Example 15-14
were hardcoded in the
page’s HTML. But such URLs can also be generated automatically by a CGI
script as part of a reply page in order to provide inputs to the script
that implements a next step in user interaction. They are a simple way
for web-based applications to “remember” things for the duration of a
session. Hidden form fields, up next, serve some of the same
purposes.

Passing Parameters in Hidden Form Fields

Similar in spirit to
the prior section, inputs for scripts can also be
hardcoded in a page’s HTML as hidden input fields. Such fields are not
displayed in the page, but are transmitted back to the server when the
form is submitted.
Example 15-15
, for instance, allows a
job field to be entered, but fills in name and language parameters
automatically as hidden input fields.

Example 15-15. PP4E\Internet\Web\tutor5d.html

CGI 101

Common input devices: hidden form fields


This demo invokes the tutor5.py server-side script again,
but hardcodes input data in the form itself as hidden input
fields, instead of as parameters at the end of URL hyperlinks.
As before, the text of this form, including the hidden fields,
can be generated as part of the page output by another CGI
script, to pass data on to the next script on submit; hidden
form fields provide another way to save state between pages.











When
Example 15-15
is
opened in a browser, we get the input page in
Figure 15-19
.

Figure 15-19. tutor5d.html input form page

When submitting, we trigger our original
tutor5.py
script once again (
Example 15-12
), but some of the inputs
have been provided for us as hidden fields. The reply page is captured
in
Figure 15-20
.

Figure 15-20. Response page created by tutor5.py (4)

Much like the query parameters of the prior section, here again
we’ve hardcoded and embedded the next page’s inputs in the input page’s
HTML itself. Unlike query parameters, hidden input fields don’t show up
in the next page’s address. Like query parameters, such input fields can
also be generated on the fly as part of the reply from a CGI script.
When they are, they serve as inputs for the next page, and so are a sort
of memory—session state passed from one script to the next. To fully
understand how and why this is necessary, we need to next take a short
diversion into state retention alternatives.

[
58
]
These are not necessarily magic numbers. On Unix machines,
mode 755 is a bit mask. The first 7 simply means that you (the
file’s owner) can read, write, and execute the file (7 in binary
is 111—each bit enables an access mode). The two 5s (binary 101)
say that everyone else (your group and others) can read and
execute (but not write) the file. See your system’s manpage on the
chmod
command for more
details.

[
59
]
Notice that the script does not generate the enclosing

and

tags included in the static
HTML file of the prior section. Strictly speaking, it should—HTML
without such tags is technically invalid. But because all commonly
used browsers simply ignore the omission, we’ll take some liberties
with HTML syntax in this book. If you need to care about such
things, consult HTML references for more formal details.

[
60
]
If your job description includes extensive testing of
server-side scripts, you may also want to explore Twill, a
Python-based system that provides a little language for scripting
the client-side interface to web applications. Search the Web for
details.

[
61
]
This technique isn’t unique to CGI scripts, by the
way. In
Chapter 12
, we briefly met
systems that embed Python code inside HTML, such as Python
Server Pages. There is no good way to test such code outside
the context of the enclosing system without extracting the
embedded Python code (perhaps by using the
html.parser
HTML parser that comes
with Python, covered in
Chapter 19
) and running it with a
passed-in mock-up of the API that it will eventually
use.

Saving State Information in CGI Scripts

One of the most
unusual aspects of the basic CGI model, and one of its
starkest contrasts to the GUI programming techniques we studied in the
prior part of this book, is that CGI scripts are
stateless
—each is a standalone program, normally run
autonomously, with no knowledge of any other scripts that may run before
or after. There is no notion of things such as global variables or objects
that outlive a single step of interaction and retain context. Each script
begins from scratch, with no memory of where the prior left off.

This makes web servers simple and robust—a buggy CGI script won’t
interfere with the server process. In fact, a flaw in a CGI script
generally affects only the single page it implements, not the entire
web-based application. But this is a very different model from
callback-handler functions in a single process GUI, and it requires extra
work to remember things longer than a single script’s execution.

Lack of state retention hasn’t mattered in our simple examples so
far, but larger systems are usually composed of multiple user interaction
steps and many scripts, and they need a way to keep track of information
gathered along the way. As suggested in the last two sections, generating
query parameters on URL links and hidden form fields in input pages sent
as replies are two simple ways for a CGI script to pass data to the next
script in the application. When clicked or submitted, such parameters send
pre
program
med selection or session
information back to another server-side handler script. In a sense, the
content of the generated reply page itself becomes the memory space of the
application.

For example, a site that lets you read your email may present you
with a list of viewable email messages, implemented in HTML as a list of
hyperlinks generated by another script. Each hyperlink might include the
name of the message viewer script, along with parameters identifying the
selected message number, email server name, and so on—as much data as is
needed to fetch the message associated with a particular link. A retail
site may instead serve up a generated list of product links, each of which
triggers a hardcoded hyperlink containing the product number, its price,
and so on. Alternatively, the purchase page at a retail site may embed the
product selected in a prior page as hidden form fields.

In fact, one of the main reasons for showing the techniques in the
last two sections is that we’re going to use them extensively in the
larger case study in the next chapter. For instance, we’ll use generated
stateful URLs with query parameters to implement lists of dynamically
generated selections that “know” what to do when clicked. Hidden form
fields will also be deployed to pass user login data to the next page’s
script. From a more general perspective, both techniques are ways to
retain state information between pages—they can be used to direct the
action of the next script to be run.

Generating URL parameters and hidden form fields works well for
retaining state information across pages during a single session of
interaction. Some scenarios require more, though. For instance, what if we
want to remember a user’s login name from session to session? Or what if
we need to keep track of pages at our site visited by a user in the past?
Because such information must be longer lived than the pages of a single
session of interaction, query parameters and hidden form fields won’t
suffice. In some cases, the required state information might also be too
large to embed in a reply page’s HTML.

In general, there are a variety of ways to pass or retain state
information between CGI script executions and across sessions of
interaction:

URL query parameters

Session state embedded in generated reply pages

Hidden form fields

Session state embedded in generated reply pages

Cookies

Smaller information stored on the client that may span
sessions

Server-side databases

Larger information that might span sessions

CGI model extensions

Persistent processes, session management, and so on

We’ll explore most of these in later examples, but since this is a
core idea in server-side scripting, let’s take a brief look at each of
these in turn.

URL Query Parameters

We met these
earlier in this chapter: hardcoded URL parameters in
dynamically generated hyperlinks embedded in input pages produced as
replies. By including both a processing script name and input to it,
such links direct the operation of the next page when selected. The
parameters are transmitted from client to server automatically, as part
of a GET-style request.

Coding query parameters is straightforward—print the correctly
formatted URL to standard output from your CGI script as part of the
reply page (albeit following some escaping conventions we’ll meet later
in this chapter). Here’s an example drawn from the next chapter’s
webmail case study:

script = "onViewListLink.py"
user = 'bob'
mnum = 66
pswd = 'xxx'
site = ' pop.myisp.net'
print('View %s'
% (script, user, pswd, mnum, site, mnum))

The resulting URL will have enough information to direct the next
script when clicked:

View 66

Query parameters serve as memory, and they pass information
between pages. As such, they are useful for retaining state across the
pages of a single session of interaction. Since each generated URL may
have different attached parameters, this scheme can provide context per
user-selectable action. Each link in a list of selectable alternatives,
for example, may have a different implied action coded as a different
parameter value. Moreover, users can bookmark a link with parameters, in
order to return to a specific state in an interaction.

Because their state retention is lost when the page is abandoned,
though, they are not useful for remembering state from session to
session. Moreover, the data appended as URL query parameters is
generally visible to users and may appear in server logfiles; in some
applications, it may have to be manually encrypted to avoid display or
forgery.

Hidden Form Input Fields

We met these in the prior section
as well: hidden form input fields that are attached to
form data and are embedded in reply web pages, but are not displayed in
web pages or their URL addresses. When the form is submitted, all the
hidden fields are transmitted to the next script along with any real
inputs, to serve as context. The net effect provides context for an
entire input form, not a particular hyperlink. An already entered
username, password, or selection, for instance, can be implied by the
values of hidden fields in subsequently generated pages.

In terms of code, hidden fields are generated by server-side
scripts as part of the reply page’s HTML and are later returned by the
client with all of the form’s input data. Previewing the next chapter’s
usage again:

print('
' % urlroot)
print('' % msgnum)
print('' % user)
print('' % site)
print('' % pswd)

Like query parameters, hidden form fields can also serve as a sort
of memory, retaining state information from page to page. Also like
query parameters, because this kind of memory is embedded in the page
itself, hidden fields are useful for state retention among the pages of
a single session of interaction, but not for data that spans multiple
sessions.

And like both query parameters and cookies (up next), hidden form
fields may be visible to users—though hidden in rendered pages and URLs,
their values still are displayed if the page’s raw HTML source code is
displayed. As a result, hidden form fields are not secure; encryption of
the embedded data may again be required in some contexts to avoid
display on the client or forgery in form submissions.

HTTP “Cookies”

Cookies, an
o
extension to the HTTP protocol underlying the web model,
are a way for server-side applications to directly store information on
the client computer. Because this information is not embedded in the
HTML of web pages, it outlives the pages of a single session. As such,
cookies are ideal for remembering things that must span sessions.

Things like usernames and preferences, for example, are prime
cookie candidates—they will be available the next time the client visits
our site. However, because cookies may have space limitations, are seen
by some as intrusive, and can be disabled by users on the client, they
are not always well suited to general data storage needs. They are often
best used for small pieces of noncritical cross-session state
information, and websites that aim for broad usage should generally
still be able to operate if cookies are unavailable.

Operationally, HTTP cookies are strings of information stored on
the client machine and transferred between client and server in HTTP
message headers. Server-side scripts generate HTTP headers to request
that a cookie be stored on the client as part of the script’s reply
stream. Later, the client web browser generates HTTP headers that send
back all the cookies matching the server and page being contacted. In
effect, cookie data is embedded in the data streams much like query
parameters and form fields, but it is contained in HTTP headers, not in
a page’s HTML. Moreover, cookie data can be stored permanently on the
client, and so it outlives both pages and interactive sessions.

For web application developers, Python’s standard library includes
tools that simplify the task of sending and receiving:
http.cookiejar
does cookie handling for HTTP clients that talk to web
servers, and the module
http.cookies
simplifies the task of creating and receiving cookies in server-side
scripts. Moreover, the module
urllib.request
we’ve studied earlier has support for opening URLs with
automatic cookie handling.

Creating a cookie

Web browsers
such as Firefox and Internet Explorer generally handle
the client side of this protocol, storing and sending cookie data. For
the purpose of this chapter, we are mainly interested in cookie
processing on the server. Cookies are created by sending special HTTP
headers at the start of the reply stream:

Content-type: text/html
Set-Cookie: foo=bar;
...

The full format of a cookie’s header is as follows:

Set-Cookie: name=value; expires=date; path=pathname; domain=domainname; secure

The domain defaults to the hostname of the server that set the
cookie, and the path defaults to the path of the document or script
that set the cookie—these are later matched by the client to know when
to send a cookie’s value back to the server. In Python, cookie
creation is simple; the following in a CGI script stores a
last-visited time cookie:

import http.cookies, time
cook = http.cookies.SimpleCookie()
cook['visited'] = str(time.time()) # a dictionary
print(cook.output()) # prints "Set-Cookie: visited=1276623053.89"
print('Content-type: text/html\n')

The
SimpleCookie
call here
creates a dictionary-like cookie object whose keys are strings (the
names of the cookies), and whose values are “Morsel” objects
(describing the cookie’s value). Morsels in turn are also
dictionary-like objects with one key per cookie property:
path
and
domain
,
expires
to give the cookie an expiration
date (the default is the duration of the browser session), and so on.
Morsels also have attributes—for instance,
key
and
value
give the name and value of the cookie,
respectively. Assigning a string to a cookie key automatically creates
a Morsel from the string, and the cookie object’s
output
method returns a string suitable for
use as an HTTP header; printing the object directly has the same
effect, due to its
__str__
operator
overloading. Here is a more comprehensive example of the interface in
action:

>>>
import http.cookies, time
>>>
cooks = http.cookies.SimpleCookie()
>>>
cooks['visited'] = time.asctime()
>>>
cooks['username'] = 'Bob'
>>>
cooks['username']['path'] = '/myscript'
>>>
cooks['visited'].value
'Tue Jun 15 13:35:20 2010'
>>>
print(cooks['visited'])
Set-Cookie: visited="Tue Jun 15 13:35:20 2010"
>>>
print(cooks)
Set-Cookie: username=Bob; Path=/myscript
Set-Cookie: visited="Tue Jun 15 13:35:20 2010"
Receiving a cookie

Now, when the
client visits the page again in the future, the cookie’s
data is sent back from the browser to the server in HTTP headers
again, in the form “Cookie: name1=value1; name2=value2 ...”. For
example:

Cookie: visited=1276623053.89

Roughly, the browser client returns all cookies that match the
requested server’s domain name and path. In the CGI script on the
server, the environment variable
HTTP_COOKIE
contains
the raw cookie data headers string uploaded from the client; it can be
extracted in Python as follows:

import os, http.cookies
cooks = http.cookies.SimpleCookie(os.environ.get("HTTP_COOKIE"))
vcook = cooks.get("visited") # a Morsel dictionary
if vcook != None:
time = vcook.value

Here, the
SimpleCookie
constructor call automatically parses the passed-in cookie data string
into a dictionary of Morsel objects; as usual, the dictionary
get
method returns a default
None
if a key is absent, and we use the
Morsel object’s
value
attribute to
extract the cookie’s value string if sent.

Using cookies in CGI scripts

To help put these
pieces together,
Example 15-16
lists a CGI script
that stores a client-side cookie when first visited and receives and
displays it on subsequent visits.

Example 15-16. PP4E\Internet\Web\cgi-bin\cookies.py

"""
create or use a client-side cookie storing username;
there is no input form data to parse in this example
"""
import http.cookies, os
cookstr = os.environ.get("HTTP_COOKIE")
cookies = http.cookies.SimpleCookie(cookstr)
usercook = cookies.get("user") # fetch if sent
if usercook == None: # create first time
cookies = http.cookies.SimpleCookie() # print Set-cookie hdr
cookies['user'] = 'Brian'
print(cookies)
greeting = '

His name shall be... %s

' % cookies['user']
else:
greeting = '

Welcome back, %s

' % usercook.value
print('Content-type: text/html\n') # plus blank line now
print(greeting) # and the actual html

Assuming you are running this chapter’s local web server from
Example 15-1
, you can invoke
this script with a URL such as
http://localhost/cgi-bin/cookies.py
(type this in
your browser’s address field, or submit it interactively with the
module
urllib.request
). The first
time you visit the script, the script sets the cookie within its
reply’s headers, and you’ll see a reply page with this message:

His name shall be... Set-Cookie: user=Brian

Thereafter, revisiting the script’s URL in the same browser
session (use your browser’s reload button) produces a reply page with
this message:

Welcome back, Brian

This occurs because the client is sending the previously stored
cookie value back to the script, at least until you kill and restart
your web browser—the default expiration of a cookie is the end of a
browsing session. In a realistic program, this sort of structure might
be used by the login page of a web application; a user would need to
enter his name only once per browser session.

Handling cookies with the urllib.request module

As mentioned earlier,
the
urllib.request
module provides an interface for reading the reply from a URL, but it
uses the
http.cookiejar
module to also support storing and sending cookies on the client.
However, it does not support cookies “out of the box.” For example,
here it is in action testing the last section’s cookie-savvy
script—cookies are not echoed back to the server when a script is
revisited:

>>>
from urllib.request import urlopen
>>>
reply = urlopen('http://localhost/cgi-bin/cookies.py').read()
>>>
print(reply)
b'

His name shall be... Set-Cookie: user=Brian

\n'
>>>
reply = urlopen('http://localhost/cgi-bin/cookies.py').read()
>>>
print(reply)
b'

His name shall be... Set-Cookie: user=Brian

\n'

To support cookies with this module properly, we simply need to
enable the cookie-handler class; the same is true for other optional
extensions in this module. Again, contacting the prior section’s
script:

>>>
import urllib.request as urllib
>>>
opener = urllib.build_opener(urllib.HTTPCookieProcessor())
>>>
urllib.install_opener(opener)
>>>
>>>
reply = urllib.urlopen('http://localhost/cgi-bin/cookies.py').read()
>>>
print(reply)
b'

His name shall be... Set-Cookie: user=Brian

\n'
>>>
reply = urllib.urlopen('http://localhost/cgi-bin/cookies.py').read()
>>>
print(reply)
b'

Welcome back, Brian

\n'
>>>
reply = urllib.urlopen('http://localhost/cgi-bin/cookies.py').read()
>>>
print(reply)
b'

Welcome back, Brian

\n'

This works because
urllib.request
mimics the cookie behavior of
a web browser on the client—it stores the cookie when so requested in
the headers of a script’s reply, and adds it to headers sent back to
the same script on subsequent visits. Also just as in a browser, the
cookie is deleted if you exit Python and start a new session to rerun
this code. See the library manual for more on this module’s
interfaces.

Although easy to use, cookies have potential downsides. For one,
they may be subject to size limitations (4 KB per cookie, 300 total,
and 20 per domain are one common limit). For another, users can
disable cookies in most browsers, making them less suited to critical
data. Some even see them as intrusive, because they can be abused to
track user behavior. (Many sites simply require cookies to be turned
on, finessing the issue completely.) Finally, because cookies are
transmitted over the network between client and server, they are still
only as secure as the transmission stream itself; this may be an issue
for sensitive data if the page is not using secure HTTP transmissions
between client and server. We’ll explore secure cookies and server
concepts in the next chapter.

For more details on the cookie modules and the cookie protocol
in general, see Python’s library manual, and search the Web for
resources. It’s not impossible that future mutations of HTML may
provide similar storage
solutions.

BOOK: Programming Python
6.65Mb size Format: txt, pdf, ePub
ads

Other books

Chill of Fear by Hooper, Kay
Starfish by James Crowley
Lady Afraid by Lester Dent
The '85 Bears: We Were the Greatest by Ditka, Mike, Telander, Rick
Treacherous by L.L Hunter
The Chain Garden by Jane Jackson
Flanders by Anthony, Patricia
Christmas in Vampire Valley by Cooper, Jodie B.