• Ready for review
  • SEO Tools - Sitemap and Robots.txt

    Our strategy

    Our SEO tools are a combination of automations and manual fine-tuning possibilities.

    We automate as much as possible by generating most configuration files automatically. Additionally we allow the editor to manage which URL are indexed or not with some manual control centralised in one single user interface in Sanity.

    Glossary

    SEO has its own set of tools and terms. Let’s briefly explain those that are relevant to this article

    Crawler / Search Engine Bot

    a program that scans the internet looking for pages to index.
    A crawler finds URLs from different sources:

    1. Pages linked from other (indexed) pages. For example when the homepage is read all the links are also followed and indexed and the behaviour repeats recursively.

    2. Manual suggestions via online tools

    3. The sitemap file

    robots.txt

    a file hosted on the root of a market’s website which contains rules for disallowing specific URLs or folders from being indexed.

    If a page is indexed already disallowing it in the robots.txt file will have no effect.

    To remove it from the index the page should contain a "meta robots“ tag.

    It also links to the sitemap.

    sitemap.xml

    a file hosted on the root of a market’s website which contains a list of URLs that a search engine bot should index. The purpose of this file is suggesting URLs to the crawler. Nothing more.

    Sitemaps are an important part of website optimization as they provide search engines an avenue for discovering pages on a site. It isn't always possible to internally link every page on a site, especially when dealing with a large website, however, with sitemaps, you can ensure that Google is able to discover important pages, even those that have been orphaned.

    Consider robots and sitemap as the yin and yang or URLs crawling.
    They are not strictly opposite but they serve opposite purposes.
    The robots file disallows URLs and the sitemap suggests them.

    How we do it

    Instead of relying on manual creation and maintenance of the aforementioned files, we generate them (independently from the build and release process) following the approach:

    1. We retrieve all the static pages defined in Sanity. Depending on their flag “Include in sitemap" we add them to the sitemap or to the robots.

    2. We fetch all the dynamic (menu, categories, store locator, etc.) pages and add them to the sitemap.

    3. We read the additional URLs and excluded URLs lists that can be found in Sanity configuration:
      Desk > Marketing Content > Features > Feature SEO ([markets-domain]/desk/marketingContent;features;featureSeo)

    image-20240120-134443.png
    URLs should be root-relative (starting with /, no domain) and one per line
    image-20240120-134454.png
    URLs should be root-relative (starting with /, no domain) and one per line

    Output

    The result of these three levels are merged together and the files are generated and published nightly and optionally on-demand, independently from the release process of the app.

    They can be found at the root of any market for example:

    https://www.popeyes.es/robots.txt

    https://www.popeyes.es/sitemap.xml

    The entire process should be verified in the Google Search Console.