Archive for October, 2010

Why do web designers/developers hate Internet Explorer?

If you know anyone that works with creating websites in any capacity you've probably been encouraged to switch away from Internet Explorer.  But why should they care what browser you use?  It's not like you're making them use it.

To be honest, most of the hate is directed towards IE6 which comes with Windows XP.  IE7 was an improvement, and IE8 is actually considered acceptable by most web designers (unless they're just Microsoft haters).

So why all the hate for IE6?  The short answer is that supporting it is a pain because of its poor standards compliance.  But as long as a browser is still used by a decent proportion of users, it will need to be supported for business reasons.  So the sooner the majority of people switch off it, the sooner web designers don't have to worry about it anymore.

But let's back up for a bit.  What exactly does "poor standards compliance" mean?  In fact, what do these standards mean?  Stepping back even more, what exactly does a web browser do?

A web browser is a program that interprets HyperText Markup Language (HTML) and displays it in some fashion to a user.  The "standard" is what defines how a browser ought interpret HTML.  I put "standard" in quotes because it is somewhat of a misnomer.  Standards are partly determined by implementation, which is how it works, not how it should work.

There's a long (relatively speaking) history of how HTML in its current form came to be Dive Into HTML5 has very nice summary of the history of this.  I'm not going to repeat what it says, but I do want to quote the following:

HTML has always been a conversation between browser makers, authors, standards wonks, and other people who just showed up and liked to talk about angle brackets. Most of the successful versions of HTML have been "retro-specs," catching up to the world while simultaneously trying to nudge it in the right direction. Anyone who tells you that HTML should be kept "pure" (presumably by ignoring browser makers, or ignoring authors, or both) is simply misinformed. HTML has never been pure, and all attempts to purify it have been spectacular failures, matched only by the attempts to replace it.

But if there is no "pure" standard, why is there such a push for using "standards compliant" browsers?

It's because for the first time in the history of the web, the latest versions of browsers from all the major manufacturers are actually close to implementing a unified standard.  I believe this is largely due to the fact that browser usage statistics are more diversified than ever.  According to most sources Internet Explorer (all versions combined) now has less than 50% of the browser usage for the first time since the late 90's.

While it is still the most used browser,  the fact that it no longer has an overwhelming majority means that Microsoft can no longer run the show without consulting with and collaborating with other browser makers.  At the same time, they've also dug themselves into a hole where they don't want to break existing sites that have been made to work on previous versions of IE, yet they still want to adhere to standards so that web developers can design pages that work on all browsers without resorting to ugly hacks that literally say, if it's in IE do this, otherwise do the normal thing.

Joel Spolsky has a very insightful post on how difficult a situation Microsoft was in when creating IE8.  Particularly apt I feel are the following analogies:

And if you’re a pragmatist on the Internet Explorer 8.0 team, you might have these words from Raymond Chen seared into your cortex. He was writing about how Windows XP had to emulate buggy behavior from old versions of Windows:

Look at the scenario from the customer’s standpoint. You bought programs X, Y and Z. You then upgraded to Windows XP. Your computer now crashes randomly, and program Z doesn’t work at all. You’re going to tell your friends, “Don’t upgrade to Windows XP. It crashes randomly, and it’s not compatible with program Z.” Are you going to debug your system to determine that program X is causing the crashes, and that program Z doesn’t work because it is using undocumented window messages? Of course not. You’re going to return the Windows XP box for a refund. (You bought programs X, Y, and Z some months ago. The 30-day return policy no longer applies to them. The only thing you can return is Windows XP.)

And you’re thinking, hmm, let’s update this for today:

Look at the scenario from the customer’s standpoint. You bought programs X, Y and Z. You then upgraded to Windows XPVista. Your computer now crashes randomly, and program Z doesn’t work at all. You’re going to tell your friends, “Don’t upgrade to Windows XPVista. It crashes randomly, and it’s not compatible with program Z.” Are you going to debug your system to determine that program X is causing the crashes, and that program Z doesn’t work because it is using undocumentedinsecure window messages? Of course not. You’re going to return the Windows XPVista box for a refund. (You bought programs X, Y, and Z some months ago. The 30-day return policy no longer applies to them. The only thing you can return is Windows XPVista.)

This was posted in 2008 while IE8 was still in Beta testing.  He speculated that the IE8 team would probably reverse their decision to have the default mode be standards compliant because it broke about half the pages in some way.  However Microsoft found a clever solution that I feel was the best possible compromise between the pragmatists and idealists.

During Beta (as well as in the final release) they defaulted to "standards" mode which broke many pages, however, there is a button to view a page in "compatibility mode".  During the Beta phase they recorded popular sites where "compatibility mode" was turned on by many users, and with that came up with a "Compatibility View list" that users could choose to use or not when they run IE8 for the first time.  This list is updated through Windows Update about every 2 months and all the owners of all the sites on the list are contacted letting them know they are on the list and how they can get off the list when they bring their site up to standards.

So how did this whole mess get started?

Looking at a history of HTML we see that HTML was first introduced in about 1990 and there wasn't really a formal specification.

After several browsers started appearing on the scene, HTML2 was published in 1995 that was a retro-spec of what was "roughly corresponding to the capabilities of HTML in common use prior to June 1994."

In 1996 "HTML 3.2 released which retrofitted the Netscape inventions into the HTML 'standard'" and "Internet Explorer 2 [was] released, virtually bug for bug compatible with Netscape (except for a few new bugs…)"

Finally HTML4 was "released" in 1997 and it represents more or less what is the current state today in 2010.  (There's technically some new specs like XHTML but the feature set has been more or less untouched.)

The First Browser War greatly contributed to the mess by pushing Microsoft and Netscape to furiously try to outdo each other in features (such as such as the classic 90's effects Blink and Marquee) with little attention being paid to the buggy consequences of rapid feature development.

After winning the browser war, Microsoft didn't really have much reason to spend further development time and money on browser development, leading to a mini-"Dark Ages" for the browser world.

If we look at the timeline we can see that IE1 was released on August 16, 1995.  IE2 was released on November 22, 1995, a mere 3 months later, trying desperately to play catchup with Netscape which had almost a year head-start being released on December 15, 1994.

IE continued with about 1 major version every 1-1.5 years until IE6 on August 27, 2001, by which point IE (all versions) had over 90% of the browser market.  When you own 90% of the market, standards be damned, you are the standard.

After that, development stagnated even as IE usage continued to rise until 2004 when Firefox 1 was released.

On February 15, 2005 plans for IE7 were announce, with the final release happening on October 18, 2006, a full 5 years after IE6.  Six days later on October 24, 2006, Firefox 2.0 was released.  Both featured tabbed browsing with enhanced security and phishing filters.

Also in 2006 the Mozilla Corporation (a commercial subsidiary of the non-profit Mozilla Foundation, formed to help fund the operations of the Foundation and get around the limitations of a non-profit entity) received a large sum of money (85% of their $66.8 million revenue in 2006) from "assigning [Google] as the browser's default search engine, and for click-throughs on ads placed on the ensuing search results pages."  With that kind of funding in addition to community contributions to the project Firefox has been able to successfully keep up and even pull ahead of Microsoft's efforts.

With the ubiquity of the internet and Google fervently pushing it's own browser offering with Google Chrome, browser usage is more diversified than ever, making standardization ever more important.  The benefit of standardization, somewhat counter-intuitively, is giving choice to the consumer.

Standardization encourages features that are external to defining how a page is displayed.  With designers spending less time trying to simply get a site working properly in every major browser, they now have more time to spend on things that actually matter: giving customers and users a better experience.  Browsers also become more diversified in their personality, out of necessity to differentiate themselves from each other.  Chrome favors minimalism and simplicity, Firefox favors customization and flexibility, and Internet Explorer favors… familiarity?  I'm not actually sure to be honest, but it definitely has its own personality.

The push for upgrading browsers isn't an anti-Microsoft movement (although it probably was historically) but a movement to increase the amount of choice a user has without worrying about their favorite site not working in their favorite browser.

Coming back to the main question, after going on a long tangent, of "Why do web designers/developers hate Internet Explorer (6/7)?"  The answer is that they are the last remaining browsers in popular use that still regularly need to be handled differently to get sites working properly.  At least this is true on the desktop.  Mobile browsers are a whole other can of worms, with a difference being that most people don't expect every sites to work perfectly on mobile devices yet.  There's still tiny quirks here and there in current browsers, but for the most part pages are write once, run anywhere.

(HTML5 is coming/partially here but it is a more well thought out standard to seeks to build on top of rather than replace the current standard.)

Why do programs crash?

We've all seen it before, where a program suddenly decides to stop working possibly causing you to lose a lot of unsaved data.

Why does this happen?

The simplistic but probably not very satisfying answer is that there was an error in the program.

There are 3 fundamental types of computer program errors that are related to when/how the error is discovered: Compile-time errors, Run-time errors, and "silent" errors.  Runtime errors are the ones that result in crashes that you see as a user.

To illustrate the difference, let's pretend you have a robot that you want to program to fill up the gas tanks on your 5 cars.  A program is essentially a set of instructions for a computer to follow.  Let's say your instructions are the following (yes the typos are intentional):

  1. Take the keys, they're color-coded so each key works on the car of the same color.
  2. Start the car.
  3. Drive 3 blcks nort
  4. Stop to the left of the pump
  5. Put the nozzle in the gas receptacle on the right side of the car
  6. Start pumping
  7. Pay for the gas
  8. Drive back
  9. Start the next car and repeat from step 2 until all the cars are filled

Compile-time errors are essentially syntax errors.  The analogue to natural languages is either spelling or grammar errors.  It's something that the computer can tell is incorrect just by looking at it and it will inform the programmer that it has no idea what they're talking about.

In simplistic terms the act of "compiling" is essentially the act of translating the code that a programmer wrote into something the computer understands.  Contrary to what some non-programmers may think, computers do not understand code as written, instead it must be translated.  When the compiler doesn't know how to translate something, it gives a compiler error, which is seen right away by the programmer and fixed.

In the example above the robot's compiler would say "There's an error on line 3, I don't know what 'blck' and 'nort' mean."  That's simple enough to fix.  This is the most desirable type of error.  It's noticed before any damage is done.

Let's fix that error and go on to the next type of error.

  1. Take the keys, they're color-coded so each key works on the car of the same color.
  2. Start the car.
  3. Drive 3 blocks north
  4. Stop to the left of the pump
  5. Put the nozzle in the gas receptacle on the right side of the car
  6. Start pumping
  7. Pay for the gas
  8. Drive back
  9. Start the next car and repeat from step 2 until all the cars are filled

A runtime error is one that is only discovered when trying to follow the instructions.  You of course want to test your robot's program, so you ride along for the first car.  Everything goes fine, so you leave and have the robot finish up.

The 3rd car has the fuel door on the left side of the car.  When the robot gets to this point because you told it to "Put the nozzle in the gas receptacle on the right side of the car" but there wasn't one on the right side.  It will stop everything and essentially "crash".  Programs will sometimes have a way to send the error report to the programmer with what it was trying to do when it failed.

As extreme as it sounds to completely stop everything, it's often the safest thing to do in an unexpected situation.  A program is incapable of "thinking" for itself, and can only do what it is instructed to do.  It could be instructed to ignore anything it can't do, but that leads us to the 3rd type of error.

"Silent" errors are the ones were the program continues executing despite something going wrong.  It could be the case that the programmer haphazardly tells the program to ignore all errors and skip to the next step.  In this situation as you can see it would be bad to not know an error happened.  The robot would try to "Put the nozzle in the gas receptacle on the right side of the car" but fail because the tank is on the other side.

Now what happens?  Well the robot then starts pumping, that goes fine, it just doesn't go in the car but the robot wasn't asked to make sure the gas goes in the car.  Pay for the gas, sure it's on the floor, but it was still pumped so now it's paid for.  Drive back, and now no one will be the wiser.

Ultimately program crashes are almost always due to programming error (but sometimes it is due to hardware or operating system error), but crashes help the programmer know that there was in fact an error.  As annoying as it may be, it is often the lesser of two evils.

As for why there are errors in programs, it's ultimately the fact that humans are fallible and programming is inherently complicated.  If you actually see the program error reports, you will sometimes see an error referred to as an "Unhandled Exception".  The reason is that the errors are almost always "exceptional" circumstances that the programmer didn't think to account for, or something that should never happen unless something went wrong.

In our previous example, we could add a step after filling up to make sure the gas meter shows tank is full.  If not then we have an exceptional situation, that if the programmer didn't account for, would be "unhandled".  Usually the earlier the error is detected the easier it is to fix.

The difference with errors in programs and errors in many other fields are how noticeable the errors are.  If a waiter gets an order wrong once, it affects that person's order (and possibly the restaurant's reputation, but often it is simply forgotten).  If a program has an error, it affects everyone that uses the program.  In a program with millions of users, each mistake is that much more costly.

Why won't websites send me my password?

Almost every site you go to, if you lose your password, the site refuses to send it to you.  Instead they either create a new one for you or send you a link to change it.

Why is this?  Since they must know your password to check if you entered the correct one, shouldn't they be able to send it to you?

That's actually not entirely correct.  Any responsible site will not store your actual password.  Instead they store what is called a "hash" of your password.  Hashing is a one-way process of changing some value (in this case your password) into some other value (what actually gets saved).  It's one-way in that you can calculate the hash from your password, but there's no way to calculate your password from the hash. (Actually that's not 100% true.  It's possible but very difficult to come up with a list of some potential passwords, but it can't guarantee finding the exact one, and the list of potential passwords can be infinite if passwords are allowed to be infinitely long) .

So how exactly does hashing work?  How can you only go one way but not the other?  Let's make up a simple method of hashing (called a hashing algorithm).  It's not even remotely close to being as secure as ones that are used for real, but it's simple to explain and it gets the point across.

Take the password character by character.  If it's a number, leave it as is.  If it's a letter, convert it to a number, i.e. A is 1, B is 2, Z is 26 etc.  After you do that, add up all the numbers.  For example, let's say our password is "cab123".  Doing the number translation gives us "312123" and adding that up gives 12.  This number is what actually gets saved as your "password".

The site doesn't know your actual password, all they know is that when you hash your password you get the number 12.  When you try to log in, the password you typed is hashed and compared to what's in the database.

Right now you're probably thinking "Well what if someone else guesses a password that whose numbers and letters also add up to 12?"  Well with our simple algorithm they would indeed be able to log in as you with the password "EG" (5+7).

Clearly real algorithms are more sophisticated than this.  Most of them create numbers that are 38 digits long (128-bit) or more which makes it extraordinarily unlikely that someone would randomly guess something that hashes to the same thing as your password.  Even that's an understatement. You're probably more likely to get hit by a falling plane within 5 minutes of winning the lottery than for someone just randomly typing to get something else that has the same hash as your password.

The point is that there isn't a way to calculate the original value from the hashed value.  In our simple example, given only the number 12 there's no way to figure out definitely that the password was "cab123".  It could have been "eg" or "321abc" or "93" or "ab9".  The point of hashing isn't to prevent someone else from getting into your account on that site.  If they were able to get the hash then it's likely they already have access to your other information on that site.  The reason for hashing is so that even if they do hack that one site, they don't know your original password that you may be using on other sites.

Real hashing algorithms make it hard to even get a list of possible passwords.  Pretty much the only way is to brute-force it by trying different passwords and seeing if they get a match.  That isn't actually as far-fetched as it sounds however.

To combat a brute force, most sites (hopefully) do something that is called "salting".  How salting works is actually amazingly simple.  Before hashing a password, a site will add some letters (or numbers or any other combination of characters) to either the front or back of the passwords.  This is called the "salt".  So if my password was "ILoveKittens" and the salt was "ungawunga" then the hash of "ungawungaILoveKittens" would be saved instead.  I'll get into how salting helps after I describe the kinds of attacks that they protect against.

One form of brute force attack is what is known as a "dictionary attack" which takes different combinations of dictionary words as passwords and calculate a huge list of hashes.  Now they can compare it to a list of hashed passwords in a database they have.  How did this get this database?  Maybe they started an illegitimate site who's main purpose is to get people's passwords.  Maybe they're a hacker that got into some other site's database. Maybe a malicious employee at a company tried to do this.  There are numerous ways this could happen.

Another attack similar to a dictionary attack is called a "rainbow attack" where hashes are calculated for every combination of characters up to some length, and stores them in what's called a "rainbow table".  Doing every combination of characters takes exponentially longer with password length, which is why longer passwords are recommended.

For example, if a password is 2 characters long and only contains numbers, then there are 100 different combinations: 10 possible digits (0-9) with 2 characters is 10².  If we add just 1 character to get a 3 digit password, now there are 10³ (1,000) different combinations, so each character makes it 10 times as difficult to create a full list.

If we used both numbers and letters, then with 2 characters we would have 10 numbers and 26 letters, which is 36² (1,296). With 3 characters it is 36³ (46,656).  See how much faster that grew with adding letters into the mix?  If we allow lower-case and upper-case letters to be treated differently, for a 2 character password that gives us 62² (3,844), and with 3 characters it's 62³ (238,328).  See how much faster it grows?  If we add a 4th character to our case-sensitive alpha-numeric password, that gives us 14,776,336 different combinations.

Even still, computers are getting very fast these days, and as of 2008 there are rainbow tables for every alpha-numeric password under 10 characters.  That sounds terribly scary, and this is why passwords should be salted in addition to hashing.  Salting does 2 things.  For one it makes the password longer.  Remember each additional character of length adds a ton of difficulty, and these rainbow tables can take years to create.  In our uppercase/lowercase/numeric password scenario, adding a single character makes it 62 times as hard.  If a table for 10 characters took a year to calculate, one for 11 characters would take 62 years, effectively making it infeasible.

Hashes and salts are usually stored together because the salt is needed to recreate the hash.  It would seem like an attacker could just take the existing list and add the salt to them and calculate new hashes.  While this is true, this also isn't very viable because a single rainbow table is only valid for one specific salt (or no salt).  If each user has a different salt (which they should) then instead of spending a year creating a password table for every single password less than 10 characters (if they're unsalted), you instead have to create a different table for every user.

Dictionary attacks are still effective with simple passwords though, since the number of combinations that make up real words is a lot smaller than those that can be any combination of characters. This is why it's encouraged that people pick strong passwords that can't be easily guessed by a computer putting together different combinations of known words and phrases.  For example, some sites require you have a number and some combination of upper case and lowercase letters, or even symbols.

It is a social responsibility of software developers to apply this level of security, but unfortunately (as far as I'm aware) there is no legal requirement for site-owners to create secure sites (except perhaps in the financial sector or other regulated businesses). Ultimately, it comes down to "How much do I trust this site?"  I would recommend that at the very least you have 2 different passwords.  One for sites of companies you trust to have good security, and a different one for more questionable sites.

Remember, all it takes is just one unsecured site with a semi-determined hacker to find your password. If you're using the same password on other sites, you're not only risking your password on that site, but every other site where you use that password. This is especially dangerous with the common trend of using your email address as your username.  Your primary email is in a separate category and should have its own dedicated password. Your email is the key to most sites you sign up with, because that's where your (hopefully new) password will be sent when you lose it. If someone gets access to your email account, they could get your passwords to other sites in the same way.

The ideal solution is to have a different password for every site.  You may forget the lesser used ones but most sites provide a way to recover it by email.  Which once again brings up the point that you should keep your email password the most secure of all.  It's like your master key that opens everything.  You can also use different email addresses for different sites.  I personally use a separate email account for sites that I suspect may spam me or seems otherwise untrustworthy.

I hope this post has been helpful and that you feel better knowing why most sites can't send you your passwords.  You should be extra wary of sites that can.

Feel free to comment on anything that is unclear or just flat out wrong (or heck, even grammatical errors).

Next week: Why do programs crash?  Feel free to suggest further topics, as I don't have one in mind for the week after yet.

Return top