Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (129 page)

BOOK: Programming Python
5.73Mb size Format: txt, pdf, ePub
ads
NNTP: Accessing Newsgroups

So far in this chapter,
we have focused on Python’s FTP and email processing tools
and have met a handful of client-side scripting modules along the way:
ftplib
,
poplib
,
smtplib
,
email
,
mimetypes
,
urllib
, and so on. This set is representative of
Python’s client-side library tools for transferring and processing
information over the Internet, but it’s not at all complete.

A more or less comprehensive list of Python’s Internet-related
modules appears at the start of the previous chapter. Among other things,
Python also includes client-side support libraries for Internet news,
Telnet, HTTP, XML-RPC, and other standard protocols. Most of these are
analogous to modules we’ve already met—they provide an object-based
interface that automates the underlying sockets and message
structures.

For instance, Python’s
nntplib
module supports the client-side interface to NNTP—the Network News
Transfer Protocol—which is used for reading and posting articles to Usenet
newsgroups on the Internet. Like other protocols, NNTP runs on top of
sockets and merely defines a standard message protocol; like other
modules,
nntplib
hides most of the
protocol details and presents an object-based interface to Python
scripts.

We won’t get into full protocol details here, but in brief, NNTP
servers store a range of articles on the server machine, usually in a
flat-file database. If you have the domain or IP name of a server machine
that runs an NNTP server program listening on the NNTP port, you can write
scripts that fetch or post articles from any machine that has Python and
an Internet connection. For instance, the script in
Example 13-28
by default fetches and
displays the last 10 articles from Python’s Internet newsgroup,
comp.lang.python
, from the
news.rmi.net
NNTP server at one of my ISPs.

Example 13-28. PP4E\Internet\Other\readnews.py

"""
fetch and print usenet newsgroup posting from comp.lang.python via the
nntplib module, which really runs on top of sockets; nntplib also supports
posting new messages, etc.; note: posts not deleted after they are read;
"""
listonly = False
showhdrs = ['From', 'Subject', 'Date', 'Newsgroups', 'Lines']
try:
import sys
servername, groupname, showcount = sys.argv[1:]
showcount = int(showcount)
except:
servername = nntpconfig.servername # assign this to your server
groupname = 'comp.lang.python' # cmd line args or defaults
showcount = 10 # show last showcount posts
# connect to nntp server
print('Connecting to', servername, 'for', groupname)
from nntplib import NNTP
connection = NNTP(servername)
(reply, count, first, last, name) = connection.group(groupname)
print('%s has %s articles: %s-%s' % (name, count, first, last))
# get request headers only
fetchfrom = str(int(last) - (showcount-1))
(reply, subjects) = connection.xhdr('subject', (fetchfrom + '-' + last))
# show headers, get message hdr+body
for (id, subj) in subjects: # [-showcount:] if fetch all hdrs
print('Article %s [%s]' % (id, subj))
if not listonly and input('=> Display?') in ['y', 'Y']:
reply, num, tid, list = connection.head(id)
for line in list:
for prefix in showhdrs:
if line[:len(prefix)] == prefix:
print(line[:80])
break
if input('=> Show body?') in ['y', 'Y']:
reply, num, tid, list = connection.body(id)
for line in list:
print(line[:80])
print()
print(connection.quit())

As for FTP and email tools, the script creates an NNTP object and
calls its methods to fetch newsgroup information and articles’ header and
body text. The
xhdr
method, for
example, loads selected headers from a range of messages.

For NNTP servers that require authentication, you may also have to
pass a username, a password, and possibly a reader-mode flag to the NNTP
call. See the Python Library manual for more on other NNTP parameters and
object methods.

In the interest of space and time, I’ll omit this script’s outputs
here. When run, it connects to the server and displays each article’s
subject line, pausing to ask whether it should fetch and show the
article’s header information lines (headers listed in the variable
showhdrs
only) and body text. We can
also pass this script an explicit server name, newsgroup, and display
count on the command line to apply it in different ways. With a little
more work, we could turn this script into a full-blown news interface. For
instance, new articles could be posted from within a Python script with
code of this form (assuming the local file already contains proper NNTP
header lines):

# to post, say this (but only if you really want to post!)
connection = NNTP(servername)
localfile = open('filename') # file has proper headers
connection.post(localfile) # send text to newsgroup
connection.quit()

We might also add a tkinter-based GUI frontend to this script to
make it more usable, but we’ll leave such an extension on the suggested
exercise heap (see also the PyMailGUI interface’s suggested extensions at
the end of the next chapter—email and news messages have a similar
structure).

HTTP: Accessing Websites

Python’s standard library (the modules
that are installed with the interpreter) also includes
client-side support for HTTP—the Hypertext Transfer Protocol—a message
structure and port standard used to transfer information on the World Wide
Web. In short, this is the protocol that your web browser (e.g., Internet
Explorer, Firefox, Chrome, or Safari) uses to fetch web pages and run
applications on remote servers as you surf the Web. Essentially, it’s just
bytes sent over port 80.

To really understand HTTP-style transfers, you need to know some of
the server-side scripting topics covered in
Chapter 15
(e.g., script invocations and Internet
address schemes), so this section may be less useful to readers with no
such background.
Luckily
, though,
the basic HTTP interfaces in Python are simple enough for a cursory
understanding even at this point in the book, so let’s take a brief look
here.

Python’s standard
http.client
module
automates much of the protocol defined by HTTP and allows
scripts to fetch web pages as clients much like web browsers; as we’ll see
in
Chapter 15
,
http.server
also allows us to implement web servers to handle the other
side of the dialog. For instance, the script in
Example 13-29
can be used to grab any
file from any server machine running an HTTP web server program. As usual,
the file (and descriptive header lines) is ultimately transferred as
formatted messages over a standard socket port, but most of the complexity
is hidden by the
http.client
module
(see our raw socket dialog with a port 80 HTTP server in
Chapter 12
for a comparison).

Example 13-29. PP4E\Internet\Other\http-getfile.py

"""
fetch a file from an HTTP (web) server over sockets via http.client; the filename
parameter may have a full directory path, and may name a CGI script with ? query
parameters on the end to invoke a remote program; fetched file data or remote
program output could be saved to a local file to mimic FTP, or parsed with str.find
or html.parser module; also: http.client request(method, url, body=None, hdrs={});
"""
import sys, http.client
showlines = 6
try:
servername, filename = sys.argv[1:] # cmdline args?
except:
servername, filename = 'learning-python.com', '/index.html'
print(servername, filename)
server = http.client.HTTPConnection(servername) # connect to http site/server
server.putrequest('GET', filename) # send request and headers
server.putheader('Accept', 'text/html') # POST requests work here too
server.endheaders() # as do CGI script filenames
reply = server.getresponse() # read reply headers + data
if reply.status != 200: # 200 means success
print('Error sending request', reply.status, reply.reason)
else:
data = reply.readlines() # file obj for data received
reply.close() # show lines with eoln at end
for line in data[:showlines]: # to save, write data to file
print(line) # line already has \n, but bytes

Desired server names and filenames can be passed on the command line
to override hardcoded defaults in the script. You need to know something
of the HTTP protocol to make the most sense of this code, but it’s fairly
straightforward to decipher. When run on the client, this script makes an
HTTP object to connect to the server, sends it a GET request along with
acceptable reply types, and then reads the server’s reply. Much like raw
email message text, the HTTP server’s reply usually begins with a set of
descriptive
header lines, followed
by the contents of the requested file. The HTTP object’s
getfile
method gives us a file object from which
we can read the downloaded data.

Let’s fetch a few files with this script. Like all Python
client-side scripts, this one works on any machine with Python and an
Internet connection (here it runs on a Windows client). Assuming that all
goes well, the first few lines of the downloaded file are printed; in a
more realistic application, the text we fetch would probably be saved to a
local file, parsed with Python’s
html.parser
module (introduced in
Chapter 19
), and so on. Without arguments, the script
simply fetches the HTML index page at
http://learning-python.com
, a domain name I host at a
commercial service provider:

C:\...\PP4E\Internet\Other>
http-getfile.py
learning-python.com /index.html
b'\n'
b' \n'
b'\n'
b"Mark Lutz's Python Training Services\n"
b'b'\n'
b'\n'
b'\n'
b'\n'
b'\n'
C:\...\PP4E\Internet\Other>
http-getfile.py www.python.org index.html
www.python.org index.html
Error sending request 400 Bad Request
C:\...\PP4E\Internet\Other>
http-getfile.py www.rmi.net /~lutz
www.rmi.net /~lutz
Error sending request 301 Moved Permanently
C:\...\PP4E\Internet\Other>
http-getfile.py www.rmi.net /~lutz/index.html
www.rmi.net /~lutz/index.html
b'\n'
b'\n'
b'\n'
b"Mark Lutz's Book Support Site\n"
b'\n'
b'\n'

Notice the second and third attempts in this code: if the request
fails, the script receives and displays an HTTP error code from the server
(we forgot the leading slash on the second, and the “index.html” on the
third—required for this server and interface). With the raw HTTP
interfaces, we need to be precise about what we want.

Technically, the string we call
filename
in the script can refer to either a
simple static web page file or a server-side program that generates HTML
as its output. Those server-side programs are usually called CGI
scripts—the topic of Chapters
15
and
16
. For
now, keep in mind that when
filename
refers to a script, this program can be used to invoke another program
that resides on a remote server machine. In that case, we can also specify
parameters (called a query string) to be passed to the remote program
after a
?
.

Here, for instance, we pass a
language=Python
parameter to a CGI script we
will meet in
Chapter 15
(to make this work,
we also need to first spawn a locally running HTTP web server coded in
Python using a script we first met in
Chapter 1
and will revisit in
Chapter 15
):

In a different window
C:\...\PP4E\Internet\Web>
webserver.py
webdir ".", port 80
C:\...\PP4E\Internet\Other>
http-getfile.py localhost
/cgi-bin/languages.py?language=Python
localhost /cgi-bin/languages.py?language=Python
b'Languages\n'
b'

Syntax


\n'
b'

Python

\n'
b" print('Hello World') \n"
b'


\n'
b'
\n'

This book has much more to say later about HTML, CGI scripts, and
the meaning of the HTTP GET request used in
Example 13-29
(along with POST, one of
two way to format information sent to an HTTP server), so we’ll skip
additional details here.

Suffice it to say, though, that we could use the HTTP interfaces to
write our own web browsers and build scripts that use websites as though
they were subroutines. By sending parameters to remote programs and
parsing their results, websites can take on the role of simple in-process
functions (albeit, much more slowly and
indirectly).

BOOK: Programming Python
5.73Mb size Format: txt, pdf, ePub
ads

Other books

After the Kiss by Karen Ranney
El códice Maya by Douglas Preston
A Week at the Airport by Alain de Botton
The Night Watch by Sergei Lukyanenko
Someone Like You by Jennifer Gracen