Switching to Coveo Headless/Atomic from Coveo on Sitecore Part 1: Indexing

The first in our expert series on transitioning to Coveo Atomic guides you through connectors, field mappings, and more.

October 31, 2023

By Ryan Allan

There are a couple of reasons why you would want to move to Coveo Headless or Atomic. Dan’s article goes into good detail on using Coveo with XM Cloud. That’s probably the biggest reason, but maybe you are also just doing a Sitecore upgrade and looking to future-proof your site. You could be moving towards a more composable stack to take advantage of the new web technology and integrations, or it’s just been a few years, and it’s time to upgrade your Coveo for Sitecore, and you want the newest, best technology.

What’s the First Thing to Start With?

Getting your data into Coveo. The original Coveo for Sitecore module handles all of that for you with its installation package, built-in config files, and out-of-the-box field handling. With Coveo Headless or Atomic, you have to handle that yourself. For the rest of this article, we’ll just call the new technology the Coveo Platform since we are mostly concerned with the back-end cloud portion. There are two good ways to get your Sitecore data into the Coveo platform without the Coveo for Sitecore module. These are the Sitemap connector and the Website connector.

The sitemap connector gives you good control over what gets indexed and how, and is definitely the way to go if you have a large site or a site that’s sparsely connected to itself. If you have large collections of articles that you’ve been using Coveo to serve your users, you want the sitemap, too.

The website connector is simpler to set up and works very well for smaller, well-connected sites.

For indexing, though, the main issue for both is making your data visible to the Coveo platform.

Both connectors work well with metadata tags, and you can include additional data in the sitemap items, so that’s what we’ll talk about. Once you have the data included in tags or on the page, you can add field mappings.

Fields and Metadata and Changes

Which fields are gone that you might have been using before? You can check your Coveo.SearchProvider.config file for a complete list, but the likely fields are _id, full path, site (for multi-site Sitecore implementations), hasLayout, the __smallupdateddate field and the alltemplates field.

The author, clickableuri, date, language, title, concepts, and so on are still there. These are also all the fields that used to be prefixed with “sys”, and if your Coveo for Sitecore implementation is old enough to be still using that prefix, now is a good time to remove it since Coveo has deprecated the prefix.

After you’ve gone over your old Coveo result templates, query pipelines, doneBuildingQuery handlers and know which missing fields you are using, what do you do? Here are a few examples. Filtering on the ID of an item, like keeping the search page out of its own results? For that, you can just add the Sitecore GUID into a meta tag on the page yourself or in the Sitemap metadata for the item.

If you were using alltemplates to handle which result template to show for each item, like a BaseArticle template inherited by NewsArticle, KnowledgeArticle, and so on, you can extra the code that calculates the value from Sitecore.ContentSearch.ComputedFields. AllTemplates and reuse that. Alternatively, you can use the field conditions methods of Atomic/Headless and work with the data structure instead of the item types.

For the site field, that’s probably best handled just by looking at the item URL now. Since the Coveo platform connectors work on the public-facing side instead of the Sitecore database backend, you might not even need it if you prefer to have a Coveo source per site.

Now, you might ask, what about hasLayout? Just about everybody uses that to filter out data sources, partial page items, and sometimes media, even. Well, the good news is that since these connectors depend on public-facing items, it’s not going to see the six accordion data items anymore or the “notable work” section you might have on an employee biography page. For the rest, you can get a lot of use out of the filetype field and filtering on “HTML” in your query pipelines.

What About Custom Fields?

Custom fields have the best news, really. Since the code behind those is already entirely in your control, you can just add the values directly to meta tags on the page or in the sitemap. It’s one of the simplest parts of this upgrade. The only thing you really have to remember here is to add your own field mappings once you have finished moving the code.

What About Custom Code in Sitecore Pipelines?

This is where some of the bad news comes in. If you are using the coveoProcessParsedRestResponse pipeline to modify search results as they come in, there isn’t a drop-in replacement for that. You’ll have to see which changes can be moved to the indexing step, and others may have to be done in JavaScript on the client side. This step is largely the same for either Headless or Atomic.

Avoiding Potential Pain Points

The first thing is that the fancy URL rewriting feature of Coveo for Sitecore has no equivalent in Headless/Atomic. By default, the URLs will just match where you are indexing from. So, if you have your sitemap or website connector set up on your CM server, all the results will have URLs like https://cm.mysite.com/en/articles/amazing-news. Since, again, the connectors depend on public-facing data (or at least, Sitecore front-end data if you are running a private site), the simplest way to avoid this is to run the connectors against your content delivery server.

If your publishing or content workflow depends on the Coveo_master_index and a preview site, it’s probably simplest to change that since the master indexing isn’t part of the new technology. Look forward to a future article on how to set that up if you need it, no matter what.

For the second point, you might be tempted to replace the “last updated date” in the sitemap connector with your own published date, like in company news articles. You’re better off adding a new “publishDate” field because the sitemap connector uses that date to decide whether to reindex an item and trying to figure out why your updates to a recent news item aren’t showing up is no fun. Sure, you’ll have to do some extra work in the ranking weights, perhaps, but it’s worth it. (The machine learning model might just handle that fine anyway.)

On to the third thing, and after this, we are concerned with the front end. Both Headless and Atomic require all fields beyond the base system fields to be registered. Headless uses the registerFieldsToInclude field action, and Atomic uses a fields-to-include attribute on the atomic-search-interface element like this:

<atomic-search-interface fields-to-include='["fieldA", "fieldB"]'></atomic-search-interface>

You have to update that value for every field. If you are wondering why pieces of your result templates are blank even though the data is obviously indexed, this is why. This is a big change from Coveo For Sitecore.

One minor issue, just to save digging around, is that the Atomic React Wrapper doesn’t work with server-side rendering as of the time of this writing. Add a ‘use client’ directive if you are deploying on Vercel or using XM Cloud to avoid that issue.

The last gotcha is that Coveo will happily index its own search pages. You might say that it doesn’t seem like a big deal; it did that before. The big difference is that both connectors poll your site and fairly frequently at that. They aren’t push-type connectors like Coveo for Sitecore is. If you have a whole bunch of different search pages, like a general page, one for articles, one for your knowledge base, and so on, this will eat into your Coveo query quota. This is especially a concern if you have a Coveo recommendations component on many pages. Turning off the Coveo searches if they see the CoveoBot indexing is fairly simple, but it’s best to get that done early in the upgrade.

Conclusion

That covers most of what you need to know about the differences between Coveo for Sitecore and the Coveo Platform indexing. For more information about integrating Coveo with Next JS and Sitecore Headless, see this article. We’ve got a lot more about Coveo Headless and Atomic on our blog, too.



Ryan

Ryan Allan

Senior Developer

Ryan is a seasoned Senior Developer at Fishtank and is Sitecore Developer Certified. In a nutshell, he describes his role as "building extremely fancy websites". Ryan has a degree in Mechanical Engineering and previously worked on power plants and compressors, and now builds websites for the same company as a client! Outside of work, his interests are science + physics and spending time with his kids.