The source file here (index.html) is probably more typical of some of the lazy programmers out there who do HTML. Not that there is anything especially bad about being lazy, but I do think it will get you in the end.
There is a wonderful tool out there, called HTML-Tidy (or perhaps just tidy), which is probably the best tool for cleaning up HTML. If nothing else, it should be used to turn your HTML into XHTML, which parses more easily. The file C_index.html was produced by giving tidy (-o C_index.html -i -w -c -asxml) as command line options and switches on a Debian GNU/Linux machine. The switches read as: indent the output, wrap the output (at 68 columns), clean it, and output it as XHTML.
This structure of this file was created by being in HTML helper MMM Abbrev Fill mode in emacs, with the expert sub-mode chosen. When I decided to open the new about.html file, it (emacs) set up all the structure by itself. It isn't perfect, I had to manually change the email address from my locally accessible one, to one which is Internet accessible.