It's something we use every day, but how exactly does the web work?

I want to start off with making a distinction between the internet and the web.

The internet is about the connections and the transfer of information.  A good analogy would be that the internet is like the postal system.  It doesn't care what gets sent to who, it just needs to know how to get the information from one address to another.

The web is like a mail-order system that uses the internet to send stuff to people.  The main difference is that on the web it's one request one reply, in other words you can't be sent something if you didn't request anything (but there's nothing to prevent "false advertising" i.e. when you send a request, you are only guaranteed that your request is sent to the address you gave. You are not guaranteed that you will get exactly what you requested.

I've left out a lot of details, but essentially the entire web runs off sending a request and getting a response.  In fact it's that simplicity that gives so much power to the web that has only started being fully realized in the past couple years.

Let's run through how requesting a web page works with an extended analogy (we're ignoring any kind of links for now, just request/response).

Let's say you want to order a '97 black Thunderbird (which coincidentally is what I drive!) .  Now you're just really damn lazy and so you ask your butler Alfred (who will be playing the part of your web browser) to fetch it for you.  You tell Alfred to get "http://reputable-dealer.com/cars/1997/thunderbird/black.car".

We can break down the URL (Uniform resource locator) above into 3 parts.  The first part tells Alfred to get there via HTTP (Hyper Text Transfer Protocol) which in our analogy can mean to go there by bus.  There are other ways to get places like FTP (File Transfer Protocol) which I'll say means to go by plane, but this dealership doesn't have a runway, just a bus stop, so we'll have to use HTTP.

The next part, "reputable-dealer.com", is the name of the place.  But while the name is easy to remember, it doesn't actually tell us their address.  Every computer that can be accessed by another computer has an address which looks something like "209.85.225.99".  This is called an IP address.

To find the IP address we need to look it up in what's called a DNS server, which works much like the yellow pages do for looking up addresses.

Now that we have the actual IP address we know where to go.  Once we get there, we ask the car dealer  (who will be playing the role of the Server) for "cars/1997/thunderbird/black.car".  This is the extent of what all of the web has in common on the request end.  We get to the server after finding their address and we ask for a resource.

From this point the car dealer will either give Alfred the car he asked for or inform him of one of the following: that they couldn't find that model (i.e. the "404 Not Found" errors you have probably seen from time to time), that it's sold out (e.g. "503 Service Unavailable"), or that Alfred isn't allowed to see that car ("403 Forbidden"), or the dealer might tell Alfred that the car has been relocated and redirect him to the new location.  Either way Alfred will bring back the car or an error.

In the case of a redirect, Alfred will just check the new location before coming back to you.  He will tell you the address he got the car or error message from, which might be different from where you asked him to go in the case of a redirect.


All of the above hasn't really changed much since the early days.  What has changed over the years is how the server gets the data.  (Also how the browser handles that data, but that's for part 2)

In the old days pretty much every resource had a physical file on the server, i.e. in our analogy it's like a car lot, all the cars are already made and the dealer just goes to grab it.  On the web these are known as Static Pages.  They already exist and they're just given to you as is.

Now many (possibly most) pages are Dynamic Pages, i.e. there can be some customization to the pages before they are sent.  The urls for these also often have what's called a querystring.  Let's take a look at an example: "http://reputable-dealer.com/cars/1997/thunderbird/black.car?cdplayer=yes&hoodornament=yes".  I'm sure you've all seen pages that look like this.  In our analogy the dealer will go get the car, then add a cd player and a hood ornament before handing it over to Alfred.

Beyond that, there may even be the possibility that the dealer doesn't carry any cars premade.  It's possible with this dealer that all cars are made to order.  It can also be any combination of the above, some popular models may be premade, while the less popular ones are made to order.

The main point is that a browser doesn't know or care how the server gets a page.  All it asks for is a page (or other resource) and the server can obtain it however it wants.  This is one of the reasons some pages can take a long time to load even if you have a fast internet connection.  It's possible that the page is created on the fly and the creation can take a long time.

These days this is the most common reasons for pages being slow, especially if a lot of people are connecting at the same time.  The server's trying to create pages for tons of people at the same time and it just can't do it fast enough.

The other possibility is that the server simply cannot send as fast as you can receive.  Despite the ridicule Senator Ted Stevens got for referring to the internet as a "series of tubes" it actually is an apt metaphor for this purpose.  You can have a wide pipe on your end (i.e. high bandwidth) which means you can receive things really fast, but if they can't send it just as fast, then you won't be able to utilize all your bandwidth.

That's it for part 1.  In part 2 I'll cover how pages are actually displayed and what the browser does after it gets the data from the server.  Stay tuned!