Click for SearchNZ Homepage

Search for
SearchNZ Navigation
31 Jul 2010

Excluding pages from SearchNZ

You may prefer that some, or all, of the pages on your Web site not be available from people searching with SearchNZ. Perhaps you have areas of the site that is still under construction, or you have an area of your site more for internal or family use that you would rather not allow people to search.

To allow for this SearchNZ offers facilities for Web site administrators and owners to limit what gets added to the SearchNZ index.

Site owners of sites without their own domain name
(eg. 'http://www.yourhost.co.nz/~yoursite/') should use the 'The Robots META tag'.

and owners of Web sites with their own Domain Name
(eg. http://www.yoursite.co.nz); should use the 'Robots Exclusion Standard'

The Robots Exclusion Standard

SearchNZ's Search Engine obeys the Robot Exclusion Standard which specifies a standard for site administrators to direct search engine 'robots' that allow others to search the indexed content. You do this with a small text file called "robots.txt" - a file that contains the instructions for visiting Search Engine robots. With a correctly written robots.txt file you can exclude SearchNZ from your entire site, from particular directories, or from specific pages.

To exclude your entire site from all web crawlers (that support the Robots.txt standard), create a file named robots.txt that states:

User-agent: * Disallow: /

To limit the exclusion to a particular directory, put the complete path address in the Disallow: statement. For instance,

Disallow: /personal/

To limit the exclusion to a particular file, put the complete path address and filename in the Disallow: statement. For instance,

Disallow: /personal/familypage.html
More information on the Robot Exclusion Standard is available from www.robotstxt.org
The Robots META tag

You can also use <META> tags to exclude crawlers from specific pages on your Web site. (if you have access to the files on your web site you can add these without having to have your Web hosting company (or ISP) assist you)

How to use the Robots META tag
The Robots META tag contains instructions that the robot can read, with each instruction separated by a comma.

Like any META tag it should be placed in the HEAD section of an HTML page:

<html>
<head>
<meta name="robots" content="noindex,nofollow">
<title>...</title>
</head>
<body>
...
The currently defined instructions are: (NO)INDEX and (NO)FOLLOW. The INDEX directive tells SearchNZ whether it should index the page or not (ie. allow others to search it). The FOLLOW directive specifies if a robot is to follow links on the page. By default, SearchNZ will index all qualifying pages and follow all qualifying links.
<meta name="robots" content="index,follow">
So to prevent SearchNZ from including a page is its index use:
<meta name="robots" content="noindex,nofollow">

SearchNZ currently recognises:

  • NOINDEX prevents anything on the page from being included in SearchNZ.
  • NOFOLLOW prevents the crawler from following the links on the page and indexing the linked pages.
More information on the Robots META tag is available from www.robotstxt.org

 

Advertisement

Advertise with NZCity




» Finance
» Weather
» NZ News
» TV Guide
» Start Page
» Horoscopes  
» Contact Us



HOME | Add your site | New sites | NZ Domain names | About SearchNZ
Add SearchNZ to your website | Contact us | Advertising Info | Your privacy | NZCity



©  2010 New Zealand City Ltd. All rights reserved