Saturday, February 24, 2007

Backing up blogger

Some time ago, I wrote about the relatively painless transition from original Blogger to the new Blogger. I mentioned that I backed up my posts with wget, and intended to come back and write about the command I used.

It's only taken a month and a bit, but here it is.

wget --wait=3 -r --level=2 --span-hosts \, \
--timestamping --backup-converted \
--referer= \

Change "MY-BLOG" in all three places to whatever the name of your blog is.

For those who want an explanation of the command-line options:

  1. Be a good Internet citizen and don't hammer Blogger's servers (not to mention your own Internet link). Use the --wait=3 option to pause three seconds between downloads. If you like, you can also add a --random-wait option to make wget look more like a human browsing than a robot.

  2. The -r option tells wget to download recursively, rather than just a single webpage; --level=2 tells it how deep to go.

  3. The --span-hosts option tells wget that it is okay to download from domains other than the original webpage; the --domains lists the domains to download from.

  4. One of those domains is; that's where old Blogger stored your uploaded images. The other domain is, naturally, your own blog.

  5. We use the --timestamping to make sure we only download new files; --backup-converted ensures we backup local files on our hard disk before they are overwritten.

  6. The --referer option adds a "referer" header to the HTTP request. Some web servers play silly games if you don't include a referer, like sending you to the home page instead of the page you want.

  7. Last but not least, you need to tell wget where to start the downloading. That would be your blog's main page.

I don't expect this to work for new Blogger. Once I've worked out the changes needed to make it work, I'll post updated instructions.

The Leading Wedge said...

Thanks for posting. There seems to be a problem with the method, however. When I run it it stops after index.html.

This one is similar, but seems to work better. I think it might be the robots thingy that does the trick: