What is GPTBot?

Question

What is GPTBot?

BizAIReady · Accepted Answer

GPTBot is OpenAI's training-data crawler — distinct from OAI-SearchBot (which powers ChatGPT search citations) and ChatGPT-User (which represents user-triggered fetches).

GPTBot is the user-agent OpenAI uses to crawl public web pages for training future ChatGPT models. It is one of three distinct OpenAI user-agents, each with a different purpose: GPTBot (training), OAI-SearchBot (citation in search-grounded answers), and ChatGPT-User (user-triggered fetches initiated from inside the ChatGPT interface).

GPTBot was introduced in August 2023 as the first formal user-agent for AI training. Site owners can block it via robots.txt with a User-agent: GPTBot directive followed by Disallow: /. Blocking GPTBot prevents your content from being used to train future ChatGPT versions but does NOT block ChatGPT from citing your pages in current search-grounded answers — that's controlled separately by OAI-SearchBot.

For most small businesses, the right policy is to allow GPTBot. Long-term presence in LLM training data is one of the most durable sources of AI-driven discovery: a model that has "learned" your business during training is more likely to recommend you confidently in future conversations, even when the user query doesn't explicitly trigger a search-grounded retrieval.

GPTBot honors robots.txt, fetches pages with a documented user-agent string, and publishes its IP ranges so site owners can verify legitimate traffic. The official documentation lives at developers.openai.com/api/docs/bots.

GPTBot

Related terms

Sources

See where your business stands.