Generate Static Version of a Site with “wget”
It was not that long ago when it was cool to create a dynamic site in Perl, C++, classic ASP, and Cold Fusion (maybe some think it is still cool). What was also cool was to create a static version of the site to run in production. Many projects I worked on would have a fully dynamic staging/preview site and there would be a process to generate static html files from that environment to push to production. This was done to help websites run faster. Apache, IIS were not as good as they are today and if a site had a lot of traffic the dynamic scripts in many cases would not scale.
I recently had a request to create a static version of a dynamic site. I remember there being several 3rd party tools, crawlers, etc out there that were easy to setup/configure. They would run through a site and generate a nice self contained static version. When I went to look for them I could not find them anywhere. They seemed to have just disappeared. I am assuming that there is just not a demand for them anymore because web servers, languages, and techniques have become so much more efficient.
I still needed to create a static site and I really did not want to write my own crawler/screen scrapper tool. I ran across a command line program called “wget”. It is an older tool but it did exactly what I needed it to do. For a quick test I ran it against labs.ratchet.com. In about 20 minutes I had a static version of the site. The “wget” file generation was not perfect and required some global find/replaces (very easy with notepad++ or Visual Studio) and some manual changes, but the process was not bad at all.
I ran the basic command line tool with the following parameters:
wget --html-extension --convert-links -r http://labs.ratchet.com -P ratchet.com
If you ever need a self contained demo of a site this tool should help get you there. I think it would be handy in a batch or shell script if you needed to connect out to a site to get data or if there were requirements to archive/snapshot sites.
Here are a links to the tool and some posts about
http://gnuwin32.sourceforge.net/packages/wget.htm
http://www.editcorp.com/Personal/Lars_Appel/wget/v1/wget_7.html
