Towards an AI-native internet

The Model Context Protocol (MCP) is a protocol designed by Anthropic, with a meteoric rise to the heart of the AI zeitgeist. It’s a universal protocol that allows developers to expose context and capabilities to any LLM.

Disclaimer: This article is fairly technical and assumes at least a cursory understanding of MCP.

image.png

Quick primer on HTTP and the world wide web

The internet at its core is a conversation between your computer and some remote computer in a server farm. This conversation is just rendered to you in a web browser like Chrome.

The most basic primitive of the internet is a client-server interaction (source)

The most basic primitive of the internet is a client-server interaction (source)

It’s not that simple though: the above server isn’t processing each request in isolation and generating the full response by itself! The server above is a really thin layer basically just routing your request to more specialized servers (called services) in an orchestrated pinball machine of computers to finally get you a response to your query like below:

The modern internet routes requests to more specialized services, then collates service responses into a meaningful response back to the requester. (source)

The modern internet routes requests to more specialized services, then collates service responses into a meaningful response back to the requester. (source)

Intelligence is an orchestrated system of AI services

GitHub Star History for Stagehand, the browser tool for AI agents

GitHub Star History for Stagehand, the browser tool for AI agents

GitHub Star History for the Browserbase MCP, the service connecting LLMs to browser tools

GitHub Star History for the Browserbase MCP, the service connecting LLMs to browser tools

As far back as 2023 the memo behind Browserbase describes even everyday, off-the-shelf VLMs like GPT-4V having the ability to answer questions like “Has the page completely loaded yet?” or “Can you see the login button?”

In what became more colloquially known as computer use, Stagehand was among the earliest AI Services, envisioning some high-level agent that can take a task like “order me a pizza”, examine a series of screenshots, and determine atomic actions to take, like “go to pizza.com”, “click medium pizza”, “tell me the price”, etc. Our job wasn’t to build the agent itself, rather to just provide any agent with tools to control a browser that can be executed with an instruction like “click the order button.”