How does the web work? Part 1: Getting data from the server

It's something we use every day, but how exactly does the web work?

I want to start off with making a distinction between the internet and the web.

The internet is about the connections and the transfer of information.  A good analogy would be that the internet is like the postal system.  It doesn't care what gets sent to who, it just needs to know how to get the information from one address to another.

The web is like a mail-order system that uses the internet to send stuff to people.  The main difference is that on the web it's one request one reply, in other words you can't be sent something if you didn't request anything (but there's nothing to prevent "false advertising" i.e. when you send a request, you are only guaranteed that your request is sent to the address you gave. You are not guaranteed that you will get exactly what you requested.

I've left out a lot of details, but essentially the entire web runs off sending a request and getting a response.  In fact it's that simplicity that gives so much power to the web that has only started being fully realized in the past couple years.

Let's run through how requesting a web page works with an extended analogy (we're ignoring any kind of links for now, just request/response).

Let's say you want to order a '97 black Thunderbird (which coincidentally is what I drive!) .  Now you're just really damn lazy and so you ask your butler Alfred (who will be playing the part of your web browser) to fetch it for you.  You tell Alfred to get "".

We can break down the URL (Uniform resource locator) above into 3 parts.  The first part tells Alfred to get there via HTTP (Hyper Text Transfer Protocol) which in our analogy can mean to go there by bus.  There are other ways to get places like FTP (File Transfer Protocol) which I'll say means to go by plane, but this dealership doesn't have a runway, just a bus stop, so we'll have to use HTTP.

The next part, "", is the name of the place.  But while the name is easy to remember, it doesn't actually tell us their address.  Every computer that can be accessed by another computer has an address which looks something like "".  This is called an IP address.

To find the IP address we need to look it up in what's called a DNS server, which works much like the yellow pages do for looking up addresses.

Now that we have the actual IP address we know where to go.  Once we get there, we ask the car dealer  (who will be playing the role of the Server) for "cars/1997/thunderbird/".  This is the extent of what all of the web has in common on the request end.  We get to the server after finding their address and we ask for a resource.

From this point the car dealer will either give Alfred the car he asked for or inform him of one of the following: that they couldn't find that model (i.e. the "404 Not Found" errors you have probably seen from time to time), that it's sold out (e.g. "503 Service Unavailable"), or that Alfred isn't allowed to see that car ("403 Forbidden"), or the dealer might tell Alfred that the car has been relocated and redirect him to the new location.  Either way Alfred will bring back the car or an error.

In the case of a redirect, Alfred will just check the new location before coming back to you.  He will tell you the address he got the car or error message from, which might be different from where you asked him to go in the case of a redirect.

All of the above hasn't really changed much since the early days.  What has changed over the years is how the server gets the data.  (Also how the browser handles that data, but that's for part 2)

In the old days pretty much every resource had a physical file on the server, i.e. in our analogy it's like a car lot, all the cars are already made and the dealer just goes to grab it.  On the web these are known as Static Pages.  They already exist and they're just given to you as is.

Now many (possibly most) pages are Dynamic Pages, i.e. there can be some customization to the pages before they are sent.  The urls for these also often have what's called a querystring.  Let's take a look at an example: "".  I'm sure you've all seen pages that look like this.  In our analogy the dealer will go get the car, then add a cd player and a hood ornament before handing it over to Alfred.

Beyond that, there may even be the possibility that the dealer doesn't carry any cars premade.  It's possible with this dealer that all cars are made to order.  It can also be any combination of the above, some popular models may be premade, while the less popular ones are made to order.

The main point is that a browser doesn't know or care how the server gets a page.  All it asks for is a page (or other resource) and the server can obtain it however it wants.  This is one of the reasons some pages can take a long time to load even if you have a fast internet connection.  It's possible that the page is created on the fly and the creation can take a long time.

These days this is the most common reasons for pages being slow, especially if a lot of people are connecting at the same time.  The server's trying to create pages for tons of people at the same time and it just can't do it fast enough.

The other possibility is that the server simply cannot send as fast as you can receive.  Despite the ridicule Senator Ted Stevens got for referring to the internet as a "series of tubes" it actually is an apt metaphor for this purpose.  You can have a wide pipe on your end (i.e. high bandwidth) which means you can receive things really fast, but if they can't send it just as fast, then you won't be able to utilize all your bandwidth.

That's it for part 1.  In part 2 I'll cover how pages are actually displayed and what the browser does after it gets the data from the server.  Stay tuned!

What is "The Cloud"?

Cloud computing and "The Cloud"  have become popular buzz words lately.  The latest Microsoft commercials are starting to make the term more known outside of the IT world.  But what exactly is it?

You'll get different answers depending on who you ask, and all the hype would have you thinking it's the latest and greatest invention, but in the broadest sense, it is simply the concept of having work done and data residing somewhere other than your physical machine.

I put "concept" in bold because that's the part that I think a lot of people miss.  It's not some specific technology or product, but just an idea.  Not even a new idea.

I'll let that sink in for a bit.

The idea (not the term) of cloud computing has been around since the very early days of computing.  The idea has been around since as early as the 1960's with one of the first manifestations of the idea being the ARPAnet, the predecessor to what is now the internet.

Various forms of what could be called cloud computing are already commonplace in the consumer space.  Any kind of web app, webmail, Google Docs, every Facebook app, anything that's a program that you don't have to download is living "in the cloud".

Two questions came to my mind when the term started picking up as a buzz word.  "Why now?" and "What's the big deal?"   I'll tackle these questions in reverse order.

First off, what's the big deal?  In the consumer space I think it's somewhat old news, it's something most internet users are already using.  The business world is usually more cautious and later adopters. Businesses are also traditionally more cautious about privacy and ownership of data.  The idea of their (intellectual) property being in the hands of someone else is less than ideal.

There are also issues of reliability and control.  If your servers go down, you have control and can make it top priority to get your customers back up and running.  If you're relying on someone else to keep your customers online they may have other priorities and that is out of your control.

Many businesses, especially smaller ones, are getting over this mentality because honestly, a dedicated company is going to be able to do a more reliable job, and it means you don't need to hire a dedicated person to make sure your machines that are serving your customers are always up and running.

So why now?  Technology is finally at a place where this has become feasible.  The internet has become ubiquitous.  You generally wouldn't ask someone if they have internet access, it's just assumed.  You don't ask if someone has an email address, you just ask what it is and assume that they have one.  Not only has internet in general become commonplace, but broadband internet is also widespread, meaning not only is the channel there, it is also fast enough to deliver the data and services efficiently.

The internet has become a valid distribution channel.  Web apps bypass a lot of the issues with distribution of traditional applications.  Cross platform comes for free.  No need to worry about creating a Mac version and Windows version (or Linux if that fits your target audience).  You can use them on public computers (with the usual precautions like remembering to log off whatever service you're using) without having to worry about installation.  You can access your stuff from anywhere.

That last sentence is important.  It is an answer to both "Why now?" and "What's the big deal?".

Mobility is shifting towards being something that's expected.  Being able to work on things from anywhere is quickly becoming an expectation from consumers and in the near future I suspect from businesses.  For many companies right now, mobility it's limited to mostly email, while most work still needs to be done in the office.

The cloud is a big deal because it allows for unprecedented mobility.  I can start writing a post at home and finish it at a friend's house.  In fact I've done so.  The cloud is important now because mobility is important now and will only become more so as it becomes an expectation.  Internet access on phones are now common and moving towards becoming standard.

The other benefit of the cloud that I haven't touched on yet is that it enables sharing and collaboration on a level that has not been possible in the past.  Whether it's sharing the latest news on Twitter, working together on a spreadsheet in Google Docs, or all your personal information on Facebook (j/k, hopefully) .  The world has become a lot more internationalized than ever.  I don't have any statistics, but I'm sure the number of people from other countries than the average person knows is probably twice what it was 10-20 years ago.

So what is "The Cloud"?  It is an ancient idea (in computing timeline) of having "stuff" that you can use from anywhere, and technology has finally come to a saturation point where it has become both feasible and economical to do at a large scale.  From a conceptual standpoint it is also another layer of abstraction like the operating system described in a previous post.

What exactly does an operating system do?

To adequately answer that question we must first look at what exactly an operating system is.

Most people know an operating system as that thing that lets you run other programs.  Wikipedia defines an OS as "software, consisting of programs and data, that runs on computers and manages the computer hardware and provides common services for efficient execution of various application software."

I like to think of an operating system as a facilitator between machine and other programs/users.

How Stuff Works has a pretty good article about How Operating Systems Work, but I'd like to focus more on the why.  Operating systems perform the very important function of acting as a layer of abstraction in between the hardware and other programs.

The concept of abstraction should be familiar to anyone working in software or cognitive science (and I'm sure other fields), but it's something that everyone uses whether they are aware of it or not.  In fact we as humans wouldn't be able to have any kind of real thought without it.

In simple terms, the idea of abstracting is the idea of hiding or ignoring details that are unimportant in a given context.  As you're reading this post, you're seeing words and sentences.  You don't think of them as individual letters even though you're aware that's what words are made of.  To take it a step further, those letters are actually made of dots of light in your screen (or if this ever makes it into print, then it would be blobs of ink).  But that's not what you are consciously aware of when you are reading because they are unimportant details that get in the way of understanding the meaning.

In the same way, an operating system hides away certain details like whether you have a USB mouse or a PS/2 mouse, or which port it's plugged into. 99% of programs don't need or want to care about that.  The OS knows where it is, and will listen to the correct port to get signals from it.  The program just asks what's the mouse doing?  It doesn't need to know how the OS got that information.

Surprisingly I find that currency makes for a good analogy to an operating system insofar as it acts as an abstraction.  In fact they probably came about for the same reason.  With the barter system, it gets annoying to find someone with corn that wants the chair you just made.  Currency came about because it provided a common way to exchange things with one another.

For a similar reason, operating systems came into existence because it provided a common way to access a variety of hardware after it got annoying dealing with all the different hardware configurations that differed in ways you don't care about.

If you're writing a program for a very specific piece of hardware then an OS isn't necessary and can even get in the way, or just not be available, similar to if you were to want to trade with some culture that doesn't use currency because they live a self-sufficient life-style.

That brings us to an extension to the analogy.  If an OS is trying to keep programs from having to worry about hardware differences, what about OS differences?  It is like different countries having different currencies.  Most currencies can be exchanged for one another (for a fee).  In a similar fashion, there are now Virtual Machines that can run another OS inside of a parent OS.  It's not quite the same as currency exchange, but similar in purpose – it's generally a little less efficient (like the currency exchange fee) and allows you to use a different OS than your primary one.

What spurred this topic was actually a question from a friend: "Why do new computers frequently refuse to run old programs? What does changing 'Compatibility modes' actually do to make them work? What do I need to do or understand if I want to play older games, especially with the new Windows os?"

The short answer is that the program was either using undocumented functionality or was making assumptions about the environment that are no longer true.  "Compatibility mode" in Windows Vista and higher makes Windows pretend to be an older version.  There are many things it can do, but since it actually is different, there are things that it can't emulate.  Unfortunately while there may be things you can do for specific games/programs, there isn't really much you can do in general to make older programs work outside of compatibility mode.

Documented functions are sort of like a contract that says the OS works this way.  It's essentially a list of guarantees.  Undocumented functions have no such guarantee.  Essentially they're just things that happen to work a certain way.

Let's take a real world example.  Most traffic lights in the US have 3 lights with Red on top, Yellow in the middle and Green on the bottom.  Red means Stop and Green means Go, that's how it's "documented".  Let's say you ignore that and instead you treat the bottom light as Go.  This works fine for all the traffic lights you've seen, so you write those as the instructions for your car program (yes, I realize that's redundant as a program is a set of instructions).

Now your car program runs into one of those funky horizontal traffic lights instead of the normal vertical ones.  It sees that the bottom light is on (they're all on the bottom, but it wasn't told to care about that).  The light happened to be Red, your car program goes and promptly crashes. (double entendre!)

That's roughly the equivalent, of what happens.  Although to be fair to the developer, sometimes Windows just doesn't provide a documented way to do what you want (maybe the program that was reading the traffic lights above was colorblind) and so they cheat to get the job done.

Windows actually goes to great lengths to ensure programs remain forward-compatible but it's not always possible.  Joel Spolsky talks about one such special case:

I first heard about this from one of the developers of the hit game SimCity, who told me that there was a critical bug in his application: it used memory right after freeing it, a major no-no that happened to work OK on DOS but would not work under Windows where memory that is freed is likely to be snatched up by another running application right away. The testers on the Windows team were going through various popular applications, testing them to make sure they worked OK, but SimCity kept crashing. They reported this to the Windows developers, who disassembled SimCity, stepped through it in a debugger, found the bug, and added special code that checked if SimCity was running, and if it did, ran the memory allocator in a special mode in which you could still use memory after freeing it.

Basically it said "I'm done with this piece of memory" then quickly used it after saying that.  The reason this was fine in DOS was because only 1 program would be running at a time, so even though it said it was done, no one else would take it so it could still use it.  This assumption wasn't true anymore in Windows where multiple programs are running at the same time and they had to share resources.

Raymond Chen, one of the biggest proponents for maintaining compatibility with older programs had the following to say about the topic (from 2003):

I could probably write for months solely about bad things apps do and what we had to do to get them to work again (often in spite of themselves). Which is why I get particularly furious when people accuse Microsoft of maliciously breaking applications during OS upgrades. If any application failed to run on Windows 95, I took it as a personal failure. I spent many sleepless nights fixing bugs in third-party programs just so they could keep running on Windows 95. (Games were the worst. Often the game vendor didn't even care that their program didn't run on Windows 95!)

Why do web designers/developers hate Internet Explorer?

If you know anyone that works with creating websites in any capacity you've probably been encouraged to switch away from Internet Explorer.  But why should they care what browser you use?  It's not like you're making them use it.

To be honest, most of the hate is directed towards IE6 which comes with Windows XP.  IE7 was an improvement, and IE8 is actually considered acceptable by most web designers (unless they're just Microsoft haters).

So why all the hate for IE6?  The short answer is that supporting it is a pain because of its poor standards compliance.  But as long as a browser is still used by a decent proportion of users, it will need to be supported for business reasons.  So the sooner the majority of people switch off it, the sooner web designers don't have to worry about it anymore.

But let's back up for a bit.  What exactly does "poor standards compliance" mean?  In fact, what do these standards mean?  Stepping back even more, what exactly does a web browser do?

A web browser is a program that interprets HyperText Markup Language (HTML) and displays it in some fashion to a user.  The "standard" is what defines how a browser ought interpret HTML.  I put "standard" in quotes because it is somewhat of a misnomer.  Standards are partly determined by implementation, which is how it works, not how it should work.

There's a long (relatively speaking) history of how HTML in its current form came to be Dive Into HTML5 has very nice summary of the history of this.  I'm not going to repeat what it says, but I do want to quote the following:

HTML has always been a conversation between browser makers, authors, standards wonks, and other people who just showed up and liked to talk about angle brackets. Most of the successful versions of HTML have been "retro-specs," catching up to the world while simultaneously trying to nudge it in the right direction. Anyone who tells you that HTML should be kept "pure" (presumably by ignoring browser makers, or ignoring authors, or both) is simply misinformed. HTML has never been pure, and all attempts to purify it have been spectacular failures, matched only by the attempts to replace it.

But if there is no "pure" standard, why is there such a push for using "standards compliant" browsers?

It's because for the first time in the history of the web, the latest versions of browsers from all the major manufacturers are actually close to implementing a unified standard.  I believe this is largely due to the fact that browser usage statistics are more diversified than ever.  According to most sources Internet Explorer (all versions combined) now has less than 50% of the browser usage for the first time since the late 90's.

While it is still the most used browser,  the fact that it no longer has an overwhelming majority means that Microsoft can no longer run the show without consulting with and collaborating with other browser makers.  At the same time, they've also dug themselves into a hole where they don't want to break existing sites that have been made to work on previous versions of IE, yet they still want to adhere to standards so that web developers can design pages that work on all browsers without resorting to ugly hacks that literally say, if it's in IE do this, otherwise do the normal thing.

Joel Spolsky has a very insightful post on how difficult a situation Microsoft was in when creating IE8.  Particularly apt I feel are the following analogies:

And if you’re a pragmatist on the Internet Explorer 8.0 team, you might have these words from Raymond Chen seared into your cortex. He was writing about how Windows XP had to emulate buggy behavior from old versions of Windows:

Look at the scenario from the customer’s standpoint. You bought programs X, Y and Z. You then upgraded to Windows XP. Your computer now crashes randomly, and program Z doesn’t work at all. You’re going to tell your friends, “Don’t upgrade to Windows XP. It crashes randomly, and it’s not compatible with program Z.” Are you going to debug your system to determine that program X is causing the crashes, and that program Z doesn’t work because it is using undocumented window messages? Of course not. You’re going to return the Windows XP box for a refund. (You bought programs X, Y, and Z some months ago. The 30-day return policy no longer applies to them. The only thing you can return is Windows XP.)

And you’re thinking, hmm, let’s update this for today:

Look at the scenario from the customer’s standpoint. You bought programs X, Y and Z. You then upgraded to Windows XPVista. Your computer now crashes randomly, and program Z doesn’t work at all. You’re going to tell your friends, “Don’t upgrade to Windows XPVista. It crashes randomly, and it’s not compatible with program Z.” Are you going to debug your system to determine that program X is causing the crashes, and that program Z doesn’t work because it is using undocumentedinsecure window messages? Of course not. You’re going to return the Windows XPVista box for a refund. (You bought programs X, Y, and Z some months ago. The 30-day return policy no longer applies to them. The only thing you can return is Windows XPVista.)

This was posted in 2008 while IE8 was still in Beta testing.  He speculated that the IE8 team would probably reverse their decision to have the default mode be standards compliant because it broke about half the pages in some way.  However Microsoft found a clever solution that I feel was the best possible compromise between the pragmatists and idealists.

During Beta (as well as in the final release) they defaulted to "standards" mode which broke many pages, however, there is a button to view a page in "compatibility mode".  During the Beta phase they recorded popular sites where "compatibility mode" was turned on by many users, and with that came up with a "Compatibility View list" that users could choose to use or not when they run IE8 for the first time.  This list is updated through Windows Update about every 2 months and all the owners of all the sites on the list are contacted letting them know they are on the list and how they can get off the list when they bring their site up to standards.

So how did this whole mess get started?

Looking at a history of HTML we see that HTML was first introduced in about 1990 and there wasn't really a formal specification.

After several browsers started appearing on the scene, HTML2 was published in 1995 that was a retro-spec of what was "roughly corresponding to the capabilities of HTML in common use prior to June 1994."

In 1996 "HTML 3.2 released which retrofitted the Netscape inventions into the HTML 'standard'" and "Internet Explorer 2 [was] released, virtually bug for bug compatible with Netscape (except for a few new bugs…)"

Finally HTML4 was "released" in 1997 and it represents more or less what is the current state today in 2010.  (There's technically some new specs like XHTML but the feature set has been more or less untouched.)

The First Browser War greatly contributed to the mess by pushing Microsoft and Netscape to furiously try to outdo each other in features (such as such as the classic 90's effects Blink and Marquee) with little attention being paid to the buggy consequences of rapid feature development.

After winning the browser war, Microsoft didn't really have much reason to spend further development time and money on browser development, leading to a mini-"Dark Ages" for the browser world.

If we look at the timeline we can see that IE1 was released on August 16, 1995.  IE2 was released on November 22, 1995, a mere 3 months later, trying desperately to play catchup with Netscape which had almost a year head-start being released on December 15, 1994.

IE continued with about 1 major version every 1-1.5 years until IE6 on August 27, 2001, by which point IE (all versions) had over 90% of the browser market.  When you own 90% of the market, standards be damned, you are the standard.

After that, development stagnated even as IE usage continued to rise until 2004 when Firefox 1 was released.

On February 15, 2005 plans for IE7 were announce, with the final release happening on October 18, 2006, a full 5 years after IE6.  Six days later on October 24, 2006, Firefox 2.0 was released.  Both featured tabbed browsing with enhanced security and phishing filters.

Also in 2006 the Mozilla Corporation (a commercial subsidiary of the non-profit Mozilla Foundation, formed to help fund the operations of the Foundation and get around the limitations of a non-profit entity) received a large sum of money (85% of their $66.8 million revenue in 2006) from "assigning [Google] as the browser's default search engine, and for click-throughs on ads placed on the ensuing search results pages."  With that kind of funding in addition to community contributions to the project Firefox has been able to successfully keep up and even pull ahead of Microsoft's efforts.

With the ubiquity of the internet and Google fervently pushing it's own browser offering with Google Chrome, browser usage is more diversified than ever, making standardization ever more important.  The benefit of standardization, somewhat counter-intuitively, is giving choice to the consumer.

Standardization encourages features that are external to defining how a page is displayed.  With designers spending less time trying to simply get a site working properly in every major browser, they now have more time to spend on things that actually matter: giving customers and users a better experience.  Browsers also become more diversified in their personality, out of necessity to differentiate themselves from each other.  Chrome favors minimalism and simplicity, Firefox favors customization and flexibility, and Internet Explorer favors… familiarity?  I'm not actually sure to be honest, but it definitely has its own personality.

The push for upgrading browsers isn't an anti-Microsoft movement (although it probably was historically) but a movement to increase the amount of choice a user has without worrying about their favorite site not working in their favorite browser.

Coming back to the main question, after going on a long tangent, of "Why do web designers/developers hate Internet Explorer (6/7)?"  The answer is that they are the last remaining browsers in popular use that still regularly need to be handled differently to get sites working properly.  At least this is true on the desktop.  Mobile browsers are a whole other can of worms, with a difference being that most people don't expect every sites to work perfectly on mobile devices yet.  There's still tiny quirks here and there in current browsers, but for the most part pages are write once, run anywhere.

(HTML5 is coming/partially here but it is a more well thought out standard to seeks to build on top of rather than replace the current standard.)

Why do programs crash?

We've all seen it before, where a program suddenly decides to stop working possibly causing you to lose a lot of unsaved data.

Why does this happen?

The simplistic but probably not very satisfying answer is that there was an error in the program.

There are 3 fundamental types of computer program errors that are related to when/how the error is discovered: Compile-time errors, Run-time errors, and "silent" errors.  Runtime errors are the ones that result in crashes that you see as a user.

To illustrate the difference, let's pretend you have a robot that you want to program to fill up the gas tanks on your 5 cars.  A program is essentially a set of instructions for a computer to follow.  Let's say your instructions are the following (yes the typos are intentional):

  1. Take the keys, they're color-coded so each key works on the car of the same color.
  2. Start the car.
  3. Drive 3 blcks nort
  4. Stop to the left of the pump
  5. Put the nozzle in the gas receptacle on the right side of the car
  6. Start pumping
  7. Pay for the gas
  8. Drive back
  9. Start the next car and repeat from step 2 until all the cars are filled

Compile-time errors are essentially syntax errors.  The analogue to natural languages is either spelling or grammar errors.  It's something that the computer can tell is incorrect just by looking at it and it will inform the programmer that it has no idea what they're talking about.

In simplistic terms the act of "compiling" is essentially the act of translating the code that a programmer wrote into something the computer understands.  Contrary to what some non-programmers may think, computers do not understand code as written, instead it must be translated.  When the compiler doesn't know how to translate something, it gives a compiler error, which is seen right away by the programmer and fixed.

In the example above the robot's compiler would say "There's an error on line 3, I don't know what 'blck' and 'nort' mean."  That's simple enough to fix.  This is the most desirable type of error.  It's noticed before any damage is done.

Let's fix that error and go on to the next type of error.

  1. Take the keys, they're color-coded so each key works on the car of the same color.
  2. Start the car.
  3. Drive 3 blocks north
  4. Stop to the left of the pump
  5. Put the nozzle in the gas receptacle on the right side of the car
  6. Start pumping
  7. Pay for the gas
  8. Drive back
  9. Start the next car and repeat from step 2 until all the cars are filled

A runtime error is one that is only discovered when trying to follow the instructions.  You of course want to test your robot's program, so you ride along for the first car.  Everything goes fine, so you leave and have the robot finish up.

The 3rd car has the fuel door on the left side of the car.  When the robot gets to this point because you told it to "Put the nozzle in the gas receptacle on the right side of the car" but there wasn't one on the right side.  It will stop everything and essentially "crash".  Programs will sometimes have a way to send the error report to the programmer with what it was trying to do when it failed.

As extreme as it sounds to completely stop everything, it's often the safest thing to do in an unexpected situation.  A program is incapable of "thinking" for itself, and can only do what it is instructed to do.  It could be instructed to ignore anything it can't do, but that leads us to the 3rd type of error.

"Silent" errors are the ones were the program continues executing despite something going wrong.  It could be the case that the programmer haphazardly tells the program to ignore all errors and skip to the next step.  In this situation as you can see it would be bad to not know an error happened.  The robot would try to "Put the nozzle in the gas receptacle on the right side of the car" but fail because the tank is on the other side.

Now what happens?  Well the robot then starts pumping, that goes fine, it just doesn't go in the car but the robot wasn't asked to make sure the gas goes in the car.  Pay for the gas, sure it's on the floor, but it was still pumped so now it's paid for.  Drive back, and now no one will be the wiser.

Ultimately program crashes are almost always due to programming error (but sometimes it is due to hardware or operating system error), but crashes help the programmer know that there was in fact an error.  As annoying as it may be, it is often the lesser of two evils.

As for why there are errors in programs, it's ultimately the fact that humans are fallible and programming is inherently complicated.  If you actually see the program error reports, you will sometimes see an error referred to as an "Unhandled Exception".  The reason is that the errors are almost always "exceptional" circumstances that the programmer didn't think to account for, or something that should never happen unless something went wrong.

In our previous example, we could add a step after filling up to make sure the gas meter shows tank is full.  If not then we have an exceptional situation, that if the programmer didn't account for, would be "unhandled".  Usually the earlier the error is detected the easier it is to fix.

The difference with errors in programs and errors in many other fields are how noticeable the errors are.  If a waiter gets an order wrong once, it affects that person's order (and possibly the restaurant's reputation, but often it is simply forgotten).  If a program has an error, it affects everyone that uses the program.  In a program with millions of users, each mistake is that much more costly.

Why won't websites send me my password?

Almost every site you go to, if you lose your password, the site refuses to send it to you.  Instead they either create a new one for you or send you a link to change it.

Why is this?  Since they must know your password to check if you entered the correct one, shouldn't they be able to send it to you?

That's actually not entirely correct.  Any responsible site will not store your actual password.  Instead they store what is called a "hash" of your password.  Hashing is a one-way process of changing some value (in this case your password) into some other value (what actually gets saved).  It's one-way in that you can calculate the hash from your password, but there's no way to calculate your password from the hash. (Actually that's not 100% true.  It's possible but very difficult to come up with a list of some potential passwords, but it can't guarantee finding the exact one, and the list of potential passwords can be infinite if passwords are allowed to be infinitely long) .

So how exactly does hashing work?  How can you only go one way but not the other?  Let's make up a simple method of hashing (called a hashing algorithm).  It's not even remotely close to being as secure as ones that are used for real, but it's simple to explain and it gets the point across.

Take the password character by character.  If it's a number, leave it as is.  If it's a letter, convert it to a number, i.e. A is 1, B is 2, Z is 26 etc.  After you do that, add up all the numbers.  For example, let's say our password is "cab123".  Doing the number translation gives us "312123" and adding that up gives 12.  This number is what actually gets saved as your "password".

The site doesn't know your actual password, all they know is that when you hash your password you get the number 12.  When you try to log in, the password you typed is hashed and compared to what's in the database.

Right now you're probably thinking "Well what if someone else guesses a password that whose numbers and letters also add up to 12?"  Well with our simple algorithm they would indeed be able to log in as you with the password "EG" (5+7).

Clearly real algorithms are more sophisticated than this.  Most of them create numbers that are 38 digits long (128-bit) or more which makes it extraordinarily unlikely that someone would randomly guess something that hashes to the same thing as your password.  Even that's an understatement. You're probably more likely to get hit by a falling plane within 5 minutes of winning the lottery than for someone just randomly typing to get something else that has the same hash as your password.

The point is that there isn't a way to calculate the original value from the hashed value.  In our simple example, given only the number 12 there's no way to figure out definitely that the password was "cab123".  It could have been "eg" or "321abc" or "93" or "ab9".  The point of hashing isn't to prevent someone else from getting into your account on that site.  If they were able to get the hash then it's likely they already have access to your other information on that site.  The reason for hashing is so that even if they do hack that one site, they don't know your original password that you may be using on other sites.

Real hashing algorithms make it hard to even get a list of possible passwords.  Pretty much the only way is to brute-force it by trying different passwords and seeing if they get a match.  That isn't actually as far-fetched as it sounds however.

To combat a brute force, most sites (hopefully) do something that is called "salting".  How salting works is actually amazingly simple.  Before hashing a password, a site will add some letters (or numbers or any other combination of characters) to either the front or back of the passwords.  This is called the "salt".  So if my password was "ILoveKittens" and the salt was "ungawunga" then the hash of "ungawungaILoveKittens" would be saved instead.  I'll get into how salting helps after I describe the kinds of attacks that they protect against.

One form of brute force attack is what is known as a "dictionary attack" which takes different combinations of dictionary words as passwords and calculate a huge list of hashes.  Now they can compare it to a list of hashed passwords in a database they have.  How did this get this database?  Maybe they started an illegitimate site who's main purpose is to get people's passwords.  Maybe they're a hacker that got into some other site's database. Maybe a malicious employee at a company tried to do this.  There are numerous ways this could happen.

Another attack similar to a dictionary attack is called a "rainbow attack" where hashes are calculated for every combination of characters up to some length, and stores them in what's called a "rainbow table".  Doing every combination of characters takes exponentially longer with password length, which is why longer passwords are recommended.

For example, if a password is 2 characters long and only contains numbers, then there are 100 different combinations: 10 possible digits (0-9) with 2 characters is 10².  If we add just 1 character to get a 3 digit password, now there are 10³ (1,000) different combinations, so each character makes it 10 times as difficult to create a full list.

If we used both numbers and letters, then with 2 characters we would have 10 numbers and 26 letters, which is 36² (1,296). With 3 characters it is 36³ (46,656).  See how much faster that grew with adding letters into the mix?  If we allow lower-case and upper-case letters to be treated differently, for a 2 character password that gives us 62² (3,844), and with 3 characters it's 62³ (238,328).  See how much faster it grows?  If we add a 4th character to our case-sensitive alpha-numeric password, that gives us 14,776,336 different combinations.

Even still, computers are getting very fast these days, and as of 2008 there are rainbow tables for every alpha-numeric password under 10 characters.  That sounds terribly scary, and this is why passwords should be salted in addition to hashing.  Salting does 2 things.  For one it makes the password longer.  Remember each additional character of length adds a ton of difficulty, and these rainbow tables can take years to create.  In our uppercase/lowercase/numeric password scenario, adding a single character makes it 62 times as hard.  If a table for 10 characters took a year to calculate, one for 11 characters would take 62 years, effectively making it infeasible.

Hashes and salts are usually stored together because the salt is needed to recreate the hash.  It would seem like an attacker could just take the existing list and add the salt to them and calculate new hashes.  While this is true, this also isn't very viable because a single rainbow table is only valid for one specific salt (or no salt).  If each user has a different salt (which they should) then instead of spending a year creating a password table for every single password less than 10 characters (if they're unsalted), you instead have to create a different table for every user.

Dictionary attacks are still effective with simple passwords though, since the number of combinations that make up real words is a lot smaller than those that can be any combination of characters. This is why it's encouraged that people pick strong passwords that can't be easily guessed by a computer putting together different combinations of known words and phrases.  For example, some sites require you have a number and some combination of upper case and lowercase letters, or even symbols.

It is a social responsibility of software developers to apply this level of security, but unfortunately (as far as I'm aware) there is no legal requirement for site-owners to create secure sites (except perhaps in the financial sector or other regulated businesses). Ultimately, it comes down to "How much do I trust this site?"  I would recommend that at the very least you have 2 different passwords.  One for sites of companies you trust to have good security, and a different one for more questionable sites.

Remember, all it takes is just one unsecured site with a semi-determined hacker to find your password. If you're using the same password on other sites, you're not only risking your password on that site, but every other site where you use that password. This is especially dangerous with the common trend of using your email address as your username.  Your primary email is in a separate category and should have its own dedicated password. Your email is the key to most sites you sign up with, because that's where your (hopefully new) password will be sent when you lose it. If someone gets access to your email account, they could get your passwords to other sites in the same way.

The ideal solution is to have a different password for every site.  You may forget the lesser used ones but most sites provide a way to recover it by email.  Which once again brings up the point that you should keep your email password the most secure of all.  It's like your master key that opens everything.  You can also use different email addresses for different sites.  I personally use a separate email account for sites that I suspect may spam me or seems otherwise untrustworthy.

I hope this post has been helpful and that you feel better knowing why most sites can't send you your passwords.  You should be extra wary of sites that can.

Feel free to comment on anything that is unclear or just flat out wrong (or heck, even grammatical errors).

Next week: Why do programs crash?  Feel free to suggest further topics, as I don't have one in mind for the week after yet.

Return top