Redirecting Old Websites

Kimo Johnson

16 May 2009

I have been gone from Dartmouth for over a year and my website has remained
active but somewhat broken due to neglect. It turns out that some of the blogs
on that site are on the first page of google search results for related
keywords. When my Dartmouth site finally goes away, it would be a shame if users
encountered 404 errors when trying to access pages. I did some quick searching
for the best way to redirect a page to a new location and it seems that a 301
redirect is a good option. I wrote a python script to redirect all the html
pages on my site to their new urls and this script might be useful for other
people facing the same problem.

Script

The script walks the full site searching for files with specific extensions, in
this case .html. It then performs simple find and replace to
redirect old file paths to new urls.

You will need to change some of the variables to reflect your old and new sites.
These variables can be found under Script settings below.


#!/usr/bin/env python

  1. Filename: make301.py
  2. Last Modified: 5/16/2009
    import sys
    import os.path
    import re
  1. Script settings
    extensions = [‘.html’]
    olddir = ‘public_html’
    oldbase = ‘/~name’
    newsite = ‘http://mynewdomain.com’

#

  1. Process command line arguments
    #
    def processArgs(argv):
    argc = len(argv)
    if argc < 3:
    print “Usage: make301.py
    sys.exit()

args = map(lambda s: s.strip(), argv[1:])

# Make sure directory exists
if not os.path.exists(args0):
print ‘Directory “%s” does not exist.’ % args0
sys.exit()

return tuple(args)

#

  1. Add filename to list if the extension is in the list
  2. of extensions specified above.
    #
    def getAllFiles(files, root, names):
    for name in names:
    (base, ext) = os.path.splitext(name)
    if ext.lower() in extensions:
    files.append(os.path.join(root,name))
    #
    #
    #
    def main(argv):
    (source_dir, htfile) = processArgs(argv)

files = []
os.path.walk(source_dir, getAllFiles, files)

# Print image names to file
fd = open(htfile,‘w’)

for f in files:
page = f.replace(olddir,oldbase)
url = f.replace(olddir,newsite)

line = “redirect 301 %s %s\n” % (page,url)
fd.write(line)

fd.close()

if name == “main”:
main(sys.argv)

Usage

To run the script, go to the
directory above your web directory (in my case public_html) and
type the following:


% python make301.py public_html .htaccess