Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

Programming Python (114 page)

BOOK: Programming Python
6.97Mb size Format: txt, pdf, ePub
ads

And here is the console window output we get when uploading two
files in serial fashion; here again, uploads run in parallel threads, so
if we start a new upload before one in progress is finished, they
overlap in time:

C:\...\PP4E\Internet\Ftp\test>
..\putfilegui.py
Server Name => ftp.rmi.net
User Name? => lutz
Local Dir => .
File Name => sousa.au
Password? => xxxxxxxx
Remote Dir => .
Upload of "sousa.au" successful
Server Name => ftp.rmi.net
User Name? => lutz
Local Dir => .
File Name => about-pp.html
Password? => xxxxxxxx
Remote Dir => .
Upload of "about-pp.html" successful

Finally, we can bundle up both GUIs in a single launcher script
that knows how to start the
get
and
put
interfaces, regardless of which
directory we are in when the script is started, and independent of the
platform on which it runs.
Example 13-9
shows this
process.

Example 13-9. PP4E\Internet\Ftp\PyFtpGui.pyw

"""
spawn FTP get and put GUIs no matter what directory I'm run from; os.getcwd is not
necessarily the place this script lives; could also hardcode path from $PP4EHOME,
or guessLocation; could also do: [from PP4E.launchmodes import PortableLauncher,
PortableLauncher('getfilegui', '%s/getfilegui.py' % mydir)()], but need the DOS
console pop up on Windows to view status messages which describe transfers made;
"""
import os, sys
print('Running in: ', os.getcwd())
# PP3E
# from PP4E.Launcher import findFirst
# mydir = os.path.split(findFirst(os.curdir, 'PyFtpGui.pyw'))[0]
# PP4E
from PP4E.Tools.find import findlist
mydir = os.path.dirname(findlist('PyFtpGui.pyw', startdir=os.curdir)[0])
if sys.platform[:3] == 'win':
os.system('start %s\getfilegui.py' % mydir)
os.system('start %s\putfilegui.py' % mydir)
else:
os.system('python %s/getfilegui.py &' % mydir)
os.system('python %s/putfilegui.py &' % mydir)

Notice that we’re reusing the
find
utility from
Chapter 6
’s
Example 6-13
again here—this time to
locate the home directory of the script in order to build command lines.
When run by launchers in the examples root directory or command lines
elsewhere in general, the current working directory may not always be
this script’s container. In the prior edition, this script used a tool
in the
Launcher
module instead to
search for its own directory (see the examples distribution for that
equivalent).

When this script is started, both the
get
and
put
GUIs appear as distinct, independently run programs; alternatively, we
might attach both forms to a single interface. We could get much fancier
than these two interfaces, of course. For instance, we could pop up
local file selection dialogs, and we could display widgets that give the
status of downloads and uploads in progress. We could even list files
available at the remote site in a selectable listbox by requesting
remote directory listings over the FTP connection. To learn how to add
features like that, though, we need to move on to the next
section.

Transferring Directories with ftplib

Once upon a time,
I used Telnet to manage my website at my
Internet Service Provider (ISP). I logged in to the web
server in a shell window, and performed all my edits directly on the
remote machine. There was only one copy of a site’s files—on the machine
that hosted it. Moreover, content updates could be performed from any
machine that ran a
Telnet client—ideal for people with travel-based
careers.
[
49
]

Of course, times have changed. Like most personal websites, today
mine are maintained on my laptop and I transfer their files to and from my
ISP as needed. Often, this is a simple matter of one or two files, and it
can be accomplished with a command-line FTP client. Sometimes, though, I
need an easy way to transfer the entire site. Maybe I need to download to
detect files that have become out of sync. Occasionally, the changes are
so involved that it’s easier to upload the entire site in a single
step.

Although there are a variety of ways to approach this task
(including options in site-builder tools), Python can help here, too:
writing Python scripts to automate the upload and download tasks
associated with maintaining my website on my laptop provides a portable
and mobile solution. Because Python FTP scripts will work on any machine
with sockets, they can be run on my laptop and on nearly any other
computer where Python is installed. Furthermore, the same scripts used to
transfer page files to and from my PC can be used to copy my site to
another web server as a backup copy, should my ISP experience an outage.
The effect is sometimes called a
mirror
—a copy of a
remote site.

Downloading Site Directories

The following two scripts
address these needs. The first,
downloadflat.py
, automatically downloads (i.e.,
copies) by FTP all the files in a directory at a remote site to a
directory on the local machine. I keep the main copy of my website files
on my PC these days, but I use this script in two ways:

  • To download my website to client machines where I want to make
    edits, I fetch the contents of my web directory of my account on my
    ISP’s machine.

  • To mirror my site to my account on another server, I run this
    script periodically on the target machine if it supports Telnet or
    SSH secure shell; if it does not, I simply download to one machine
    and upload from there to the target server.

More generally, this script (shown in
Example 13-10
) will download a
directory full of files to any machine with Python and sockets, from any
machine running an FTP server.

Example 13-10. PP4E\Internet\Ftp\Mirror\downloadflat.py

#!/bin/env python
"""
###############################################################################
use FTP to copy (download) all files from a single directory at a remote
site to a directory on the local machine; run me periodically to mirror
a flat FTP site directory to your ISP account; set user to 'anonymous'
to do anonymous FTP; we could use try to skip file failures, but the FTP
connection is likely closed if any files fail; we could also try to
reconnect with a new FTP instance before each transfer: connects once now;
if failures, try setting nonpassive for active FTP, or disable firewalls;
this also depends on a working FTP server, and possibly its load policies.
###############################################################################
"""
import os, sys, ftplib
from getpass import getpass
from mimetypes import guess_type
nonpassive = False # passive FTP on by default in 2.1+
remotesite = 'home.rmi.net' # download from this site
remotedir = '.' # and this dir (e.g., public_html)
remoteuser = 'lutz'
remotepass = getpass('Password for %s on %s: ' % (remoteuser, remotesite))
localdir = (len(sys.argv) > 1 and sys.argv[1]) or '.'
cleanall = input('Clean local directory first? ')[:1] in ['y', 'Y']
print('connecting...')
connection = ftplib.FTP(remotesite) # connect to FTP site
connection.login(remoteuser, remotepass) # login as user/password
connection.cwd(remotedir) # cd to directory to copy
if nonpassive: # force active mode FTP
connection.set_pasv(False) # most servers do passive
if cleanall:
for localname in os.listdir(localdir): # try to delete all locals
try: # first, to remove old files
print('deleting local', localname) # os.listdir omits . and ..
os.remove(os.path.join(localdir, localname))
except:
print('cannot delete local', localname)
count = 0 # download all remote files
remotefiles = connection.nlst() # nlst() gives files list
# dir() gives full details
for remotename in remotefiles:
if remotename in ('.', '..'): continue # some servers include . and ..
mimetype, encoding = guess_type(remotename) # e.g., ('text/plain', 'gzip')
mimetype = mimetype or '?/?' # may be (None, None)
maintype = mimetype.split('/')[0] # .jpg ('image/jpeg', None')
localpath = os.path.join(localdir, remotename)
print('downloading', remotename, 'to', localpath, end=' ')
print('as', maintype, encoding or '')
if maintype == 'text' and encoding == None:
# use ascii mode xfer and text file
# use encoding compatible wth ftplib's
localfile = open(localpath, 'w', encoding=connection.encoding)
callback = lambda line: localfile.write(line + '\n')
connection.retrlines('RETR ' + remotename, callback)
else:
# use binary mode xfer and bytes file
localfile = open(localpath, 'wb')
connection.retrbinary('RETR ' + remotename, localfile.write)
localfile.close()
count += 1
connection.quit()
print('Done:', count, 'files downloaded.')

There’s not much that is new to speak of in this script, compared
to other FTP examples we’ve seen thus far. We open a connection with the
remote FTP server, log in with a username and password for the desired
account (this script never uses anonymous FTP), and go to the desired
remote directory. New here, though, are loops to iterate over all the
files in local and remote directories, text-based retrievals, and file
deletions:

Deleting all local files

This script has a
cleanall
option, enabled by an
interactive prompt. If selected, the script first deletes all the
files in the local directory before downloading, to make sure
there are no extra files that aren’t also on the server (there may
be junk here from a prior download). To delete local files, the
script calls
os.listdir
to get
a list of filenames in the directory, and
os.remove
to delete each; see
Chapter 4
(or the Python library
manual) for more details if you’ve forgotten what these calls
do.

Notice the use of
os.path.join
to concatenate a directory
path and filename according to the host platform’s conventions;
os.listdir
returns filenames
without their directory paths, and this script is not necessarily
run in the local directory where downloads will be placed. The
local directory defaults to the current directory (“.”), but can
be set differently with a command-line argument to the
script.

Fetching all remote files

To grab all the files in a
remote directory, we first need a list of their
names. The FTP object’s
nlst
method
is the remote equivalent of
os.listdir
:
nlst
returns a list of the string names
of all files in the current remote directory. Once we have this
list, we simply step through it in a loop, running FTP retrieval
commands for each filename in turn (more on this in a
minute).

The
nlst
method is, more
or less, like requesting a directory listing with an
ls
command in typical interactive FTP
programs, but Python automatically splits up the listing’s text
into a list of filenames. We can pass it a remote directory to be
listed; by default it lists the current server directory. A
related FTP method,
dir
,
returns the list of line strings produced by an FTP
LIST
command
; its result is like typing a
dir
command in an FTP session, and its
lines contain complete file information, unlike
nlst
. If you need to know more about all
the remote files, parse the result of a
dir
method call (we’ll see how in a
later example).

Notice how we skip “.” and “..” current and parent directory
indicators if present in remote directory listings; unlike
os.listdir
, some (but not all)
servers include these, so we need to either skip these or catch
the exceptions they may trigger (more on this later when we start
using
dir
, too).

Selecting transfer modes with
mimetypes

We discussed output file modes for FTP earlier, but
now that we’ve started transferring text, too, I can
fill in the rest of this story. To handle Unicode encodings and to
keep line-ends in sync with the machines that my web files live
on, this script distinguishes between binary and text file
transfers. It uses the Python
mimetypes
module to choose between text
and binary transfer modes for each file.

We met
mimetypes
in
Chapter 6
near
Example 6-23
, where we used it to
play media files (see the examples and description there for an
introduction). Here,
mimetypes
is used to decide whether a file is text or binary by guessing
from its filename extension. For instance, HTML web pages and
simple text files are transferred as text with automatic line-end
mappings, and images and
tar
archives are
transferred in raw binary mode.

Downloading: text versus binary

For
binary files
data is pulled
down with the
retrbinary
method we met earlier, and
stored in a local file with binary open mode of
wb
. This file open mode is required to
allow for the
bytes
strings
passed to the
write
method by
retrbinary
, but it also
suppresses line-end byte mapping and Unicode encodings in the
process. Again, text mode requires encodable text in Python 3.X,
and this fails for binary data like images. This script may also
be run on Windows or Unix-like platforms, and we don’t want a
\n
byte embedded in an image to
get expanded to
\r\n
on
Windows. We don’t use a chunk-size third argument for binary
transfers here, though—it defaults to a reasonable size if
omitted.

For
text files
, the script instead uses
the
retrlines
method, passing
in a function to be called for each line in the text file
downloaded. The text line handler function receives lines in
str
string form, and mostly
just writes the line to a local text file. But notice that the
handler function created by the
lambda
here also adds a
\n
line-end character to the end of the
line it is passed. Python’s
retrlines
method strips all line-feed characters from lines to
sidestep platform differences. By adding a
\n
, the script ensures the proper
line-end marker character sequence for the local platform on which
this script runs when written to the file (
\n
or
\r\n
).

For this auto-mapping of the
\n
in the script to work, of course, we
must also open text output files in
w
text mode, not in
wb
—the mapping from
\n
to
\r\n
on
Windows
happens when data is written
to the file. As discussed earlier, text mode also means that the
file’s
write
method will allow
for the
str
string passed in by
retrlines
, and that text will
be encoded per Unicode when written.

Subtly, though, we also explicitly use the FTP
connection
object’s Unicode
encoding scheme
for our text output file in
open
, instead of the default.
Without this encoding option, the script aborted with a
UnicodeEncodeError
exception for some
files in my site. In
retrlines
,
the FTP object itself reads the remote file data over a socket
with a text-mode file wrapper and an explicit encoding scheme for
decoding; since the FTP object can do no better than this encoding
anyhow, we use its encoding for our output file as well.

By default, FTP objects use the
latin1
scheme for decoding text fetched
(as well as for encoding text sent), but this can be specialized
by assigning to their
encoding
attribute. Our script’s
local text output file will inherit whatever encoding
ftplib
uses and so be compatible with
the encoded text data that it produces and passes.

We could try to also catch Unicode exceptions for files
outside the Unicode encoding used by the FTP object, but
exceptions leave the FTP object in an unrecoverable state in tests
I’ve run in Python 3.1. Alternatively, we could use
wb
binary mode for the local text output
file and manually encode line strings with
line.encode
, or simply use
retrbinary
and binary mode files in all
cases, but both of these would fail to map end-lines portably—the
whole point of making text distinct in this context.

All of this is simpler in action than in words. Here is the
command I use to download my entire book support website from my ISP
server account to my Windows laptop PC, in a single step:

C:\...\PP4E\Internet\Ftp\Mirror>
downloadflat.py test
Password for lutz on home.rmi.net:
Clean local directory first?
y
connecting...
deleting local 2004-longmont-classes.html
deleting local 2005-longmont-classes.html
deleting local 2006-longmont-classes.html
deleting local about-hopl.html
deleting local about-lp.html
deleting local about-lp2e.html
deleting local about-pp-japan.html
...lines omitted...
downloading 2004-longmont-classes.html to test\2004-longmont-classes.html as text
downloading 2005-longmont-classes.html to test\2005-longmont-classes.html as text
downloading 2006-longmont-classes.html to test\2006-longmont-classes.html as text
downloading about-hopl.html to test\about-hopl.html as text
downloading about-lp.html to test\about-lp.html as text
downloading about-lp2e.html to test\about-lp2e.html as text
downloading about-pp-japan.html to test\about-pp-japan.html as text
...lines omitted...
downloading ora-pyref4e.gif to test\ora-pyref4e.gif as image
downloading ora-lp4e-big.jpg to test\ora-lp4e-big.jpg as image
downloading ora-lp4e.gif to test\ora-lp4e.gif as image
downloading pyref4e-updates.html to test\pyref4e-updates.html as text
downloading lp4e-updates.html to test\lp4e-updates.html as text
downloading lp4e-examples.html to test\lp4e-examples.html as text
downloading LP4E-examples.zip to test\LP4E-examples.zip as application
Done: 297 files downloaded.
BOOK: Programming Python
6.97Mb size Format: txt, pdf, ePub
ads

Other books

Talk of the Town by Mary Kay McComas
Castle Perilous by John Dechancie
Hidden (House of Night Novels) by Cast, P. C., Cast, Kristin
Demon Street Blues by Starla Silver
When Wishes Collide by Barbara Freethy
The King of Fear by Drew Chapman
Children of the New World: Stories by Alexander Weinstein
Hidden by Tara Taylor Quinn