XML Sitemap
SEO/AEO/GEOA machine-readable file that lists every important page on your site, helping search engines find and crawl your content faster and more reliably than they…
Robots.txt is a plain text file placed at the root of a website (e.g. yoursite.com/robots.txt) that gives instructions to search engine crawlers about which URLs they can access. Following the Robots Exclusion Protocol, it can allow or disallow specific paths, point to the XML sitemap, and set crawl delays. It does not block indexing on its own — for that you need a noindex tag — but it controls crawl behavior.
Robots.txt is the first file Google reads on your site. One typo and you can accidentally block your entire domain from being crawled — yes, this happens, and yes, it has cost companies six-figure traffic drops overnight. Used correctly, it keeps crawlers away from admin pages, search results, faceted navigation, and other URL bloat that wastes crawl budget. Used incorrectly, it nukes your SEO. The fix is review every change before deploying, and never confuse "disallow" with "noindex." Different jobs, very different consequences.
The robots.txt file lives at yoursite.com/robots.txt. Inside, you write directives by user-agent: "User-agent: *" applies to all crawlers, while "Disallow: /admin/" tells them not to crawl that path. You can also target specific bots individually, like Googlebot or GPTBot. You include a Sitemap directive pointing to your XML sitemap as well. After deploying any change, you test using Google Search Console's robots.txt tester to confirm critical URLs aren't accidentally blocked. Important: disallowing a URL in robots.txt doesn't remove it from Google's index if it's already there — for that, you need a noindex meta tag instead. Robots.txt controls crawling, not indexing of already-known pages.
A machine-readable file that lists every important page on your site, helping search engines find and crawl your content faster and more reliably than they…
The process search engines use to store and organize web pages so they can show up in results — if your page isn't indexed, it can't rank, and most sites have…
The plumbing of SEO — making sure search engines can crawl, render, and index your site quickly and cleanly, so your content actually has a chance to rank…
An automated bot that AI companies use to read websites and feed the content into their models or live answer engines — including GPTBot, ClaudeBot,…
A plain markdown file you put at the root of your website that tells AI models which pages matter most and how to read them — like robots.txt, but for large…
A tag that tells search engines which version of a page is the original when duplicates or near-duplicates exist, so ranking signals consolidate on one URL…