Customizing a Sitemap in Sitecore XM Cloud for a Multidomain Solution

Customize the sitemap.ts file in the Next.js solution to accurately reflect the corresponding domain in a Sitecore XM Cloud multidomain environment.

January 22, 2024

By Mike Payne

Multi-Domain Sitemap Strategies in Sitecore XM Cloud

So far in this series on supporting multiple language based domains in Sitecore XM Cloud, we have discussed:

In today’s blog, the last part my series on configuring multidomain language based domains, we're focusing on a crucial aspect of our multi domain solution—modifying the sitemap to ensure it accurately reflects the correct domain which you are viewing the sitemap from. We will be modifying the existing sitemap.ts file in this Next.js solution to achieve this.

The Problem

In our multidomain Sitecore solution, we only have one instance of Sitecore XM Cloud behind both Vercel instances. This means there is only one shared Site Grouping item and that drives what values we see in the Sitemap. Without any customization, we will see the same Sitemap regardless of what domain we are requesting it from.

Below is what we’d see regardless of which domain we request it from (www.trainingwebsite.ca/sitemap.xml or www.sitedeformation.ca/sitemap.xml). Note, we have language embedding for the French URL AND we have a duplicate entry for each page.

<url xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <loc>https://www.trainingwebsite.ca/education</loc>
    <lastmod>2023-11-22</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>
    <xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="x-default" href="https://www.trainingwebsite.ca/education" />
    <xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr" href="https://www.trainingwebsite.ca/fr/education" />
  </url>
<url xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <loc>https://www.trainingwebsite.ca/fr/education</loc>
    <lastmod>2023-11-22</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>
    <xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="x-default" href="https://www.trainingwebsite.ca/education" />
    <xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr" href="https://www.trainingwebsite.ca/fr/education" />
  </url>

This is what we actually want to see on one of our Sitemap nodes. <loc> will reflect what domain we are requesting the Sitemap from. The alternate <xhtml:link> nodes should also show the correct domains rather than a language embedded URL with the hostname we added to the Site Grouping item in Sitecore. We have also removed the duplicate entry.

<url xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <loc>https://www.trainingwebsite.ca/education</loc>
    <lastmod>2023-11-22</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.5</priority>
    <xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="x-default" href="https://www.trainingwebsite.ca/education" />
    <xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr" href="https://www.sitedeformation.ca/formation" />
  </url>

The Fix

Here is where the brunt of our customization is happening. We are taking the result object from the existing code and removing the entries that do not reflect the current domain, updating the <xhtml:link> node for the French pages, and updating the <loc> node for the French pages. We do not need to update the URLs where they are in English as they are already correct (this is the value coming from the Site Grouping item).

// Use the result object with existing code to update the loc property
            if (lang == 'en') {
              result.urlset.url = result.urlset.url.filter(filterUrlsEN);
              result.urlset.url = result.urlset.url.map(updateFrenchXhtmlURLs);
            } else if (lang == 'fr') {
              result.urlset.url = result.urlset.url.filter(filterUrlsFR);
              result.urlset.url = result.urlset.url.map(updateLoc);
              result.urlset.url = result.urlset.url.map(updateFrenchXhtmlURLs);
            }

Here are the two functions we are using for filtering. The first one will return false if the <loc> URL is in the French domain, thus it will not be included in the result object. The second will return true if the <loc> is in French, thus the page will be included in the result object.

// Function to filter <url> nodes based on <loc> subnode content
const filterUrlsEN = (url: Url) => {
  const loc = url.loc[0];
  return !loc.includes(FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX);
};

const filterUrlsFR = (url: Url) => {
  const loc = url.loc[0];
  return loc.includes(FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX);
};

This function updates the <loc> URL with the passed url object. In our case, we only need this functionality for French (remember, we do not need to update the English URLs).

const updateLoc = (url: Url) => {
  if (url.loc && url.loc[0]) {
    url.loc[0] = url.loc[0].replace(
      FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX,
      FRENCH_URL_DESIRED_AUTHORITY
    );
  }
  return url;
}

We call this function to update the <xhtml:link> nodes for French as they show with the English domain with the /fr language embedded.

const updateFrenchXhtmlURLs = (url: Url) => {
  if (url['xhtml:link']) {
    url['xhtml:link'].forEach((link) => {
      if (link.$.hreflang === 'fr') {
        link.$.href = link.$.href.replace(
          FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX,
          FRENCH_URL_DESIRED_AUTHORITY
        ); // Update the href value
      }
    });
  }
  return url;
};

Here is thesitemap.xml solution in its entirety.

import type { NextApiRequest, NextApiResponse } from 'next';
import {
  AxiosDataFetcher,
  GraphQLSitemapXmlService,
  AxiosResponse,
} from '@sitecore-jss/sitecore-jss-nextjs';
import { siteResolver } from 'lib/site-resolver';
import config from 'temp/config';
import { getPublicUrl } from '../../utils/publicUrlUtil';
import { Builder, parseString } from 'xml2js';

const ABSOLUTE_URL_REGEXP = '^(?:[a-z]+:)?//';
const FRENCH_URL_DESIRED_AUTHORITY = process.env.PUBLIC_FR_HOSTNAME || '';
const FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX = process.env.PUBLIC_EN_HOSTNAME + '/fr' || '';

type Url = {
  loc: string[];
  lastmod?: string[];
  changefreq?: string[];
  priority?: string[];
  'xhtml:link': {
    $: {
      xmlns: string;
      rel: string;
      hreflang: string;
      href: string;
    };
  }[];
};

// Function to filter <url> nodes based on <loc> subnode content
const filterUrlsEN = (url: Url) => {
  const loc = url.loc[0];
  return !loc.includes(FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX);
};

const filterUrlsFR = (url: Url) => {
  const loc = url.loc[0];
  return loc.includes(FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX);
};

const updateLoc = (url: Url) => {
  if (url.loc && url.loc[0]) {
    url.loc[0] = url.loc[0].replace(
      FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX,
      FRENCH_URL_DESIRED_AUTHORITY
    );
  }
  return url;
};

const updateFrenchXhtmlURLs = (url: Url) => {
  if (url['xhtml:link']) {
    url['xhtml:link'].forEach((link) => {
      if (link.$.hreflang === 'fr') {
        link.$.href = link.$.href.replace(
          FRENCH_URL_INVALID_AUTHORITY_AND_PATH_PREFIX,
          FRENCH_URL_DESIRED_AUTHORITY
        ); // Update the href value
      }
    });
  }
  return url;
};

const sitemapApi = async (
  req: NextApiRequest,
  res: NextApiResponse
): Promise<NextApiResponse | void> => {
  const {
    query: { id },
  } = req;

  // Resolve site based on hostname
  const hostName = req.headers['host']?.split(':')[0] || 'localhost';
  const site = siteResolver.getByHost(hostName);

  // create sitemap graphql service
  const sitemapXmlService = new GraphQLSitemapXmlService({
    endpoint: config.graphQLEndpoint,
    apiKey: config.sitecoreApiKey,
    siteName: site.name,
  });

  // if url has sitemap-{n}.xml type. The id - can be null if it's sitemap.xml request
  const sitemapPath = await sitemapXmlService.getSitemap(id as string);

  // Determine language of current site
  let lang = 'localhost';
  if (process.env.PUBLIC_FR_HOSTNAME && hostName.includes(process.env.PUBLIC_FR_HOSTNAME)) {
    lang = 'fr';
  } else if (process.env.PUBLIC_EN_HOSTNAME && hostName.includes(process.env.PUBLIC_EN_HOSTNAME)) {
    lang = 'en';
  }

  // if sitemap is match otherwise redirect to 404 page
  if (sitemapPath) {
    const isAbsoluteUrl = sitemapPath.match(ABSOLUTE_URL_REGEXP);
    const sitemapUrl = isAbsoluteUrl ? sitemapPath : `${config.sitecoreApiHost}${sitemapPath}`;
    res.setHeader('Content-Type', 'text/xml;charset=utf-8');

    return new AxiosDataFetcher()
      .get(sitemapUrl, {
        responseType: 'stream',
      })
      .then((response: AxiosResponse) => {
        if (lang === 'localhost') {
          response.data.pipe(res);
          return;
        }
        // BEGIN CUSTOMIZATION - Filter the sitemap per domain/language, and set the French domain to French URLs.

        // Need to prepare stream from sitemap url
        const dataChunks: Buffer[] = [];
        response.data.on('data', (chunk: Buffer) => {
          dataChunks.push(chunk);
        });

        response.data.on('end', () => {
          // Concatenate the data chunks to get the complete XML content
          const xmlData = Buffer.concat(dataChunks).toString();

          // Now, parse the XML data into an object using xml2js
          parseString(xmlData, (err, result) => {
            if (err) {
              console.error('Error parsing XML:', err);
              return;
            }
            // Use the result object with existing code to update the loc property
            if (lang == 'en') {
              result.urlset.url = result.urlset.url.filter(filterUrlsEN);
              result.urlset.url = result.urlset.url.map(updateFrenchXhtmlURLs);
            } else if (lang == 'fr') {
              result.urlset.url = result.urlset.url.filter(filterUrlsFR);
              result.urlset.url = result.urlset.url.map(updateLoc);
              result.urlset.url = result.urlset.url.map(updateFrenchXhtmlURLs);
            }

            // Convert the modified object back to XML format
            const xmlBuilder = new Builder();
            const modifiedXml = xmlBuilder.buildObject(result);

            // pipe 'modifiedXml' to response
            res.setHeader('Content-Type', 'text/xml');
            res.send(modifiedXml);
            // END CUSTOMIZATION
          });
        });
      })
      .catch(() => res.redirect('/404'));
  }

  // this approach if user goes to /sitemap.xml - under it generate xml page with list of sitemaps
  const sitemaps = await sitemapXmlService.fetchSitemaps();

  if (!sitemaps.length) {
    return res.redirect('/404');
  }

  const SitemapLinks = sitemaps
    .map((item) => {
      const parseUrl = item.split('/');
      const lastSegment = parseUrl[parseUrl.length - 1];

      return `<sitemap>
        <loc>${getPublicUrl()}/${lastSegment}</loc>
      </sitemap>`;
    })
    .join('');

  res.setHeader('Content-Type', 'text/xml;charset=utf-8');

  return res.send(`
  <sitemapindex xmlns="http://sitemaps.org/schemas/sitemap/0.9" encoding="UTF-8">${SitemapLinks}</sitemapindex>
  `);
};

export default sitemapApi;

In wrapping up this blog series, I have covered the process of configuring multiple language-based domains within Vercel for Sitecore XM Cloud. The series aimed to provide a comprehensive guide on configuring language-specific domains effectively in the context of the platform. I hope the information presented throughout the series proves valuable to those navigating the intricacies of domain configuration in a multilingual setup within Vercel and Sitecore XM Cloud.



Mike Headshot

Mike Payne

Development Team Lead

Mike is a Development Team Lead who is also Sitecore 9.0 Platform Associate Developer Certified. He's a BCIS graduate from Mount Royal University and has worked with Sitecore for over seven years. He's a passionate full-stack developer that helps drive solution decisions and assist his team. Mike is big into road cycling, playing guitar, working out, and snowboarding in the winter.