GPTBot
Last reviewed: 2026-06-14
GPTBot OpenAI's training-data crawler — distinct from OAI-SearchBot (which powers ChatGPT search citations) and ChatGPT-User (which represents user-triggered fetches).
GPTBot is the user-agent OpenAI uses to crawl public web pages for training future ChatGPT models. It is one of three distinct OpenAI user-agents, each with a different purpose: GPTBot (training), OAI-SearchBot (citation in search-grounded answers), and ChatGPT-User (user-triggered fetches initiated from inside the ChatGPT interface).
GPTBot was introduced in August 2023 as the first formal user-agent for AI training. Site owners can block it via robots.txt with a User-agent: GPTBot directive followed by Disallow: /. Blocking GPTBot prevents your content from being used to train future ChatGPT versions but does NOT block ChatGPT from citing your pages in current search-grounded answers — that's controlled separately by OAI-SearchBot.
For most small businesses, the right policy is to allow GPTBot. Long-term presence in LLM training data is one of the most durable sources of AI-driven discovery: a model that has "learned" your business during training is more likely to recommend you confidently in future conversations, even when the user query doesn't explicitly trigger a search-grounded retrieval.
GPTBot honors robots.txt, fetches pages with a documented user-agent string, and publishes its IP ranges so site owners can verify legitimate traffic. The official documentation lives at developers.openai.com/api/docs/bots.
Related terms
Sources
See where your business stands.
A $47 audit identifies what your business is missing for the AI era. 24-48 hours. 30-day money-back guarantee.
Get Your $47 Audit →