Creating A Sitemap For Search Engines


Creation of a sitemap XML file is easy using PHP

Structure Of The Sitemap

XML sitemaps can be submitted to all of the major search engine providers to identify the files within a website that you want to be indexed. The structure of a sitemap is defined at www.sitemaps.org and consists of a set of header information followed by a repeating set of information for each file to be indexed.

The only mandatory item of information is the URL of the file. All other information eg <changefreq> is optional. All of these elements are defined in detail at www.sitemaps.org.

XML

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
  <loc>http://www.example.com/one.html</loc>
  <lastmod>2014-03-24</lastmod>
  <changefreq>daily</changefreq>
  <priority>0.5</priority>
 </url>
 .
 . repeated for each URL
 .
 <url>
  <loc>http://www.example.com/two.php</loc>
  <lastmod>2014-03-24</lastmod>
  <changefreq>daily</changefreq>
  <priority>0.5</priority>
 </url>
</urlset>
 

Creating The Sitemap

Creation of the sitemap uses DOMImplementation functions to create the structure, given an array of URLs to be inserted. The resulting structure is formatted to look 'pretty' with formatOutput = true and saved to a nominated .xml file withsave($outputFile).

The basic sequence of creating the structure is that an element is created within the DOM eg $urlData = $dom->createElement('url') and then appended as a child to a previously defined element eg $urlSet->appendChild($urlData). The previous example creates an element with no contained data. If the element has contained data, this is specified when it is created eg $dom->createElement('changefreq','daily')

PHP

public function createSitemap(
                 $urlList,
                 $outputFile)
{
 $xmlSiteMap = new DOMImplementation();
 /* create the basic document */
 $dom = $xmlSiteMap->createDocument();
 /* set the encoding */
 $dom->encoding = "UTF-8";
 /* define the namespace */
 $urlSet = $dom->createElementNS(
            'http://www.sitemaps.org/schemas/sitemap/0.9',
            'urlset');
 $dom->appendChild($urlSet);
 foreach ($urlList as $url) {
   $urlData = $dom->createElement('url');
   $urlSet->appendChild($urlData);
   /* the url is the minimum info required */
   $locData = $dom->createElement(
                    'loc',
                     $url);
   $urlData->appendChild($locData);
   /* add any optional elements */
   $urlData->appendChild(
             $dom->createElement('lastmod',
             date("Y-m-d")));
   $urlData->appendChild($dom->createElement(
                        'changefreq',
                        'daily'));
   $urlData->appendChild($dom->createElement(
                        'priority',
                        '0.5'));
   }
 /* make it look pretty */
 $dom->formatOutput = true;
 $saveResult = $dom->save($outputFile);
 if ($saveResult === false) {
   return false;
 } else {
   return true;
 }

 

Validating The Sitemap

It is useful to validate the structure of the sitemap before submitting it to a search engine. This is done using the schemaValidate function, supplied with a XML Schema Definition file (.xsd) provided at www.sitemaps.org. Normally any errors found would be displayed on the screen, but if necessary they can be trapped and handled in whatever way appropriate.

PHP

public function validateSitemap($inputFile)
{
  /* stop validation errors going to the display */
  libxml_use_internal_errors(true);
  $dom = new DOMDocument();
  /* load the sitemap file */
  $dom->load($inputFile);
  $validateResult = $dom->schemaValidate(
      'http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd');
  if (!$validateResult) {
    /* get the errors into an array */
    $errors = libxml_get_errors();
    $return = $errors;
  } else {
    $return = true;
  }
  /* direct errors back to the display */
  libxml_use_internal_errors(false);
  return $return;
}
 

Submitting The Sitemap

A sitemap can be manually submitted to a search engine website, but the easiest way is to provide an entry in the robots.txt file.

ROBOTS

User-agent: *
Disallow: /test/
  .
  .
  .
Sitemap: http://www.example.com/sitemap.xml