Dan's Web Tips | Validators

Dan's Web Tips:

Validators

[<== Previous] | [Up] | [Next ==>]

Web page with Romanian translation (by Web Geek Science)

Validators vs. Linters: What's The Difference?

TIP: Understand what a validator and a linter are, what the difference between the two is, and how they can be used to help you improve your Web development.

Many people are confused about the difference between "HTML Validators" and other Web page checkers. This confusion is aided and abetted by the frequent misuse of the term "validator" by authors and promoters of programs which are not validators. Actually, there is a big distinction between the two kinds of programs, though both can be useful to HTML authors seeking to avoid errors.

Validators

A validator is a program which checks the syntax of an HTML document against a rigorous specification, as defined in a Document Type Definition (DTD). HTML is actually an application of SGML (Standard Generalized Markup Language), and all SGML documents are supposed to conform to a DTD. There are several different standard HTML DTDs (and lots more non-HTML DTDs, such as those for other document types in SGML or the newer XML, Extensible Markup Language, which is a simplified form of SGML). HTML versions 2.0, 3.2, and 4.0 all have official DTDs endorsed by the W3C (World Wide Web Consortium). (HTML 1.0 never had a formal spec, and is simply a term used vaguely to refer to the early forms of HTML in use prior to version 2.0.)

A validator uses one of these DTDs to determine if your page is syntactically correct under that particular spec. Which DTD is used is determined by your <!DOCTYPE> declaration, which should be at the beginning of each of your pages.

A validator is the only certain way of telling if your site is standards-compliant. It will tell you if some of the tags and attributes you're using are nonstandard extensions that aren't part of the DTD, and will also find syntax errors such as bad nesting and missing closing tags for elements that require them. It's a good idea to run your pages through a validator to find such errors, and fix the unintentional errors the validator finds. As for the "intentional" errors, like nonstandard tags and attributes you're using to get a particular visual effect, it's up to you whether to remove them to get your page to validate, or leave them in although they're nonstandard. In some cases, nonstandard elements will degrade gracefully on browsers that don't support them, so it's reasonably safe to keep them.

There are links to some online validators at the end of this article.

Linters/Checkers

There are a number of other programs, including online sites, standalone software, and features built into HTML editors, that check your pages for various forms of "correctness." Some of them are referred to as "validators," but if they don't use an SGML-type DTD to validate your site against, they're not true validators. (A "litmus test" is that if any purported "validator" passes your page as valid if it lacks a DOCTYPE declaration, then it's not a true validator.)

This is not to say that these "non-validator" programs, which can be referred to as linters or checkers, aren't useful. They will find various problems with Web pages such as syntax errors, elements with compatibility or accessibility problems, and in some cases will check your links for "404 Not Found" errors and your English text for misspellings. Sometimes a linter can find problems in your site that a validator wouldn't, if your code is valid in accordance with the specs but has other issues of concern that aren't addressed by the standards.

However, the output of a linter needs to be taken with a grain of salt, since it is not based on any formal standard, but just on the preferences, biases, and pet peeves of the program's author. For instance, if I wrote a linter (I haven't so far), I'd probably have it complain if you link to "index.html" instead of directly to the directory with "./". (See my discussion of this.) But that's just my preference; it doesn't violate any HTML or URL standard to do that the other way. (It's just less elegant as far as I'm concerned.)

The DOCTYPE Declaration

A validator determines which HTML standard to validate your document against by the DOCTYPE declaration at the beginning of your document. If the DOCTYPE is missing or incorrect, this will cause the validator to report errors, maybe weird ones like saying that <HTML> is an unknown tag. So you need to have the right DOCTYPE if you want your pages to validate.

In theory, browsers are able to use the DOCTYPE to determine what HTML version is used and possibly enable and disable various features accordingly, but in practice (until recently; see below) none actually do this, so the DOCTYPE is only of use to validators and does not affect the appearance of your pages in browsers. You still need to have it if you want to use validators, and some "HTML purists" also regard placing a DOCTYPE in their pages as a "political statement" indicating their support for standards in opposition to the random "tag soup" of the popular browsers.

In recent times, some browser versions have started using "DOCTYPE sniffing" to switch between a "quirks mode" that tries to stay compatible with the oddities of old browsers and a "standards mode" that follows the current standards better. Mozilla even has three modes: standards, almost-standards, and quirks. The spacing of images and tables are notably affected by this. Some newsgroup commentary has resulted, including debate about whether this is a good thing or a bad thing, and practical comments from developers who find their pages mysteriously work or fail depending on the DOCTYPE they use. See some comments on this. While the so-called "purist camp" likes the idea of browser makers moving from quirk-compatibility to standards compliance, they have some misgivings about the "DOCTYPE sniffing" approach, since it seems to be done in a rather capricious way, matching irrelevant things like the DTD URL in the DOCTYPE to determine which mode to use, rather than showing a true and full understanding of the meaning of the DOCTYPE.

The syntax of the DOCTYPE is a little arcane, with various sections indicating what standard is being followed and what organization administers it, but you don't need to construct DOCTYPEs of your own (unless you're creating new DTDs yourself, not really a good idea if you want to adhere to standards that others will be able to understand), so you can just take the appropriate DOCTYPE for the standard you wish to follow and "cut-and-paste" it into your pages. Many people, especially if they were writing HTML for a while before they started trying to validate their documents, and are used to using some "presentationalist" stuff, are likely to find the most convenient DOCTYPE to be that of HTML 4.01 Transitional (approved by W3C as a minor revision to the earlier 4.0):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

This DTD includes just about all the tags and attributes that were formerly regarded as Netscape or Internet Explorer extensions, so that most current pages can be made to validate without losing important features or sacrificing appearance. The "extended" tags and attributes that aren't in this DTD are probably not a good idea to use, because they're not consistently supported in both of the major browsers, let alone other browsers.

However, if you want "tighter", more logical code, with presentation moved to stylesheets instead of old-fashioned presentational tags, use the strict doctype:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

This document type excludes many presentational tags and attributes, sticking to pure logical structure (intended to be used along with stylesheets which give the visual recommendations for the document).

There is also a "Frameset" DOCTYPE:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">

This is to be used on a frameset document. The individual frames should use a regular DOCTYPE such as HTML 4.01 Transitional.

There are also various earlier DOCTYPEs such as 3.2 and 2.0. (3.0 was never approved, and shouldn't be used; it had various features that never got implemented in browsers.) If you want to be very conservative in your support for ancient browsers, you might try to validate your documents with a 2.0 or 3.2 DOCTYPE, but this usually isn't necessary if you're careful to make your use of newer features degrade gracefully.

And then there's the DOCTYPE for XHTML 1.0, which is a complete reformulation of HTML as an XML application, designed to be compatible with present browsers but with lots of new syntax rules. If you want to design your documents to this new standard, read the specs at the W3 Consortium site, then use one of these DOCTYPES (now officially approved as a W3C Recommendation):

XHTML 1.0:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

With XHTML, you are also supposed to use an XML declaration at the very top of the document (above the DOCTYPE):

<?xml version="1.0" encoding="UTF-8"?>

...but, unfortunately, this line seems to mess up some versions of MSIE for the Mac, making them display the document as plain text instead of HTML, despite it being sent with the standard text/html MIME type. (Well, some people try to serve XHTML pages with a text/xml MIME type, which is more technically accurate and puts some newer browsers in strict standard-checking mode, but that causes the page to fail completely in MSIE.) (But, then again, MSIE has always been known to ignore MIME types and do what it feels like doing.)

(The UTF-8 refers to the character encoding; be sure what you place here corresponds to the character encoding you actually are using. If you have nothing but plain ASCII characters with no special foreign characters or symbols, it doesn't really make much difference, but it will affect how you encode other characters.)

An XHTML 1.1 document type has now been approved by W3C as a recommendation. This is the attempted future evolution of XHTML with the deprecated elements from earlier versions removed and a number of other changes; it's currently "not ready for prime time" because today's browsers don't support it very well, and the "transitional" elements included in earlier versions for compatibility with old browsers aren't present any more.

Just make sure that, when you decide whether to use HTML or XHTML, you pick one or the other and stick with it consistently, as syntax designed for one of these will cause validation errors in the other, even though browsers are usually sloppy enough in their interpretation of Web pages to muddle through even a bastardized half-and-half page. A sure sign of ignorant Web development is when XHTML syntax is used in a page with a HTML DOCTYPE, or vice versa, but such things are rampant on the Web these days. Often people paste together snippets of code into a Frankenstein's monster of a Web page that contains a mixture of both varieties. Bits of code provided from outside such as affiliate banners and tracking pixels are common offenders. Be sure you convert any of them you use into the proper syntax for the HTML variety you're using, and say "screw you!" to their makers if they insist on holding you to contract terms that demand you don't modify their code.

HTML 5

HTML 5, much hyped already, isn't actually fully approved as of this writing, but it's got a doctype. They really took the "Keep It Simple, Stupid" maxim to heart this time: the new DOCTYPE is:

<!DOCTYPE html>

I don't know how they intend on distinguishing future standards, should there ever be any; since there's no actual version number in this string (it's distinguished from earlier HTML DOCTYPEs because none of them were quite this short and simple) there doesn't seem to be any place to indicate HTML 5.1 or 6.0 once they exist.

ISO 15445

There's also an ISO standard HTML, having the benefit of the "weight" of a true standards body with much more clout than the W3 Consortium; it has as its DOCTYPE:

<!DOCTYPE HTML PUBLIC "ISO/IEC 15445:2000//DTD HTML//EN>

The W3C validator recognizes this now. The specification is similar to W3C's HTML 4.0 Strict. See this users guide.

Bogus DOCTYPEs

Watch out for DOCTYPEs inserted or changed by WYSIWYG editors; many of them will put their favorite DOCTYPE in all your documents (replacing any other one you might have placed by hand), and this will often not be one that accurately describes the HTML code the editor is generating. In fact, some editor-generated DOCTYPEs don't even follow the proper syntax for the DOCTYPE declaration, and will cause validators to refuse to validate the document at all.

Links

Validators

Linters/Checkers

  • Link Exchange Site Inspector
  • Net Mechanic -- a very useful free site that checks the syntax and links of your site.
  • Doctor HTML
  • HTMLTidy -- cleans up HTML, reports errors, and suggests what DOCTYPE is appropriate for a page. (Command-line utility available for many platforms; open-source.)
  • TidyUI -- Windows user interface for HTMLTidy.
  • The (formerly misnamed) CSE Validator -- originally not really a validator, but a linter. Now it's actually got a genuine validator built in, but only as an optional extra feature; thus, its name is now only somewhat misleading instead of entirely wrong.

Other Utilities

Commentary, Etc.

 

[<== Previous] | [Up] | [Next ==>]

 

This page was first created 24 Sep 1998, and was last modified 12 Nov 2011.
Copyright © 1997-2011 by Daniel R. Tobias. All rights reserved.

webmaster@webtips.dan.info