Jump to main content
Menu

Meta Data

Overview of Meta Tags

  • Meta tags are always located between the <head></head> tags and provide information about the document or instructions for devices / programs reading the document.
  • Meta tags are coded as <meta />, which means that they self-terminate (there is no closing tag).
  • Meta tags do not render on the page; the user could only see them by viewing the source code.
Attribute Usage and Effect Values
charset Used to communicate the character set in use for that document.
content This required attribute gives the information about the document or the instructions to the device / program reading the file.
  • Text (value is up to you; depends on scenario)
http-equiv Used for instructions to the device / program; typically these are for HTTP header information.
  • Text (specific value depends on scenario)
name Defines the type of information about the document.
  • Text (specific value depends on scenario)

Describing a Document

  • The most common meta tag for describing a document is a written summary/overview of its content.
  • The following description could be displayed by a search engine in its results:
    <meta name="description" content="Website meta data overview written by Jason Withrow, with examples for common meta data scenarios" />
    
  • Important: The only thing you change here is the text inside the content attribute. Everything else stays the same!
  • Keep the content for your description meta tag short (approximately 155 characters or less) and when writing it try to imagine how it would read to someone scanning through a list of search engine results.
  • One final note is that the description should be about that particular page, not about the entire website. Your goal is to help users locate the exact page that meets their information needs, out of all the pages on the website and out of the entire World Wide Web.
  • You might also see sites using name="author" with the content attribute giving their name (name="copyright" is also used, with the value in the content attribute), but spiders indexing website content ignore those name values. You can create whatever meta tags you want, but spiders only care about ones they recognize.

Indicating a Content Type

  • This meta tag uses charset to define the set of characters in use.
    <meta charset="utf-8" />
    
  • UTF-8 is a widely accepted and expansive set of characters.

Giving Instructions to Bots

  • Spiders are one type of bot (program) that moves around the Web gathering information from web pages.
  • The robots meta tag allows you to give directions to these bots. The following code would instruct the bot to not index the page content (noindex) and to not follow links on that page (nofollow):
    <meta name="robots" content="noindex,nofollow" />
    
  • Specifying content="none" should be equivalent to the previous example (indicating both noindex and nofollow).
  • While you could specify content="index,follow" or content="all", those are the defaults and why the bots are visiting your page. Should you have to remind them why they are there? I never specify those values.
  • Not indexing and not following are typically used for sites that are in development. Should spiders from Google or Bing come across your site they would not list it in their results.
  • Note that these are just recommendations; bots could ignore your instructions (and malicious ones do). However, major search engines (e.g., Google and Bing) honor these instructions.
  • There are also instructions specific to specific search engine spiders. For Google there is also:
    • content="noarchive" - Google will not store a cached version of the page
    • content="nosnippet" - Google will not show a short description with your listing (and will also not show a link to the cached version)
    • content="noimageindex" - Google will not index the images in the page
  • See the full list of indexing directives for Google
  • There is also a robots.txt file that can reside on the web server and instructs bots concerning what directories they can and cannot visit.

Redirects

  • Meta tags can also be used to redirect the user to a new page. This ability to automatically request a new page from the server is called client-pull.
  • This is especially useful if the URL for a page has changed. Put up a dummy page at the old URL, which automatically redirects the user to the new URL.
  • Client-pull uses the http-equiv attribute, rather than the name attribute.
  • The code to redirect the user to a new URL could be:
    <meta http-equiv="refresh" content="5; URL=https://www.google.com" />
    
  • The number (in this case 5) specifies the number of seconds before the new URL (www.google.com) loads. This value can be set to 0 which loads the new page as soon as the current one has finished downloading.
  • Note that the content attribute only has a single set of quotation marks, not two sets.
  • As a courtesy to your users, always inform them that they will be redirected and provide them with the new URL (just in case the redirect is not configured correctly). In general, do not just redirect them without any warning.
  • You can also have the page refresh by not including a URL in your meta tag. This code would cause the page to be refreshed once, after 10 seconds:
    <meta http-equiv="refresh" content="10" />
    
  • Important: Using this redirect approach generates a 302 header, which is a temporary redirect that browsers may not cache. A 301 header, which signifies a permanent redirect, is preferable for SEO (Search Engine Optimization) and will be cached by browsers (so they immediately go to the redirect destination). To create a 301 header, server-side scripting or an instruction in an .htaccess file are two viable options.

Caching and Expiration

  • Browsers will typically cache external files, including images and stylesheets. They will also cache your page content (your HTML).
  • Caching refers to storing a copy of the file on the user's computer, so that they are not continually requesting the same file over the Internet as they visit other pages on the website. This is done to improve performance and reduce bandwidth usage.
  • One of the major drawbacks of caching is that users could be looking at outdated information, which is even more of a concern for sites where content changes rapidly.
  • In order to accommodate the variety of devices in use on the web, as well as the variety of web server configurations, three meta tags are specified. This code would disable caching for that page:
    <meta http-equiv="cache-control" content="no-cache" />
    <meta http-equiv="pragma" content="no-cache" />
    <meta http-equiv="expires" content="-1" />
    
  • The cache-control meta tag works for HTTP 1.1 web servers.
  • The pragma meta tag works for HTTP 1.0 web servers.
  • Both are included to cover our bases.
  • The expires meta tag notes the date/time when the document is expired. Spiders could remove expired documents from their search listings. Expired documents would also not be cached, so each visit would cause the document to be retrieved from the server again.
  • The content of the expires meta tag must be in GMT (Greenwich Mean Time) in order to be valid. The value given in the earlier example is invalid and intentionally so; invalid values indicate expiration now.
  • A valid expiration is:
    <meta http-equiv="expires" content="Mon, 23 Aug 2021 10:42:22 GMT" />
    
  • Only use these three tags as necessary; for many websites you would not need them.

Full Meta Tags Example

<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
  <title>Meta Tags Example</title>
  <meta charset="utf-8" />
  <meta name="robots" content="noindex,nofollow" />
  <meta http-equiv="cache-control" content="no-cache" />
  <meta http-equiv="pragma" content="no-cache" />
  <meta http-equiv="expires" content="-1" />
  <meta name="description" content="Code example showcasing meta tags and a redirect" />
  <meta http-equiv="refresh" content="5; URL=https://www.google.com" />
</head>
<body>

  <!-- Notice of redirect to Google -->
  <p>The page will change to Google (www.google.com) in 5 seconds.</p>

</body>
</html>