Monday, March 17, 2008

Does Your Website Measure Up? Part 3: Technology Under the Hood


Does Your Website Measure Up?
This blog topic is being published as a serial. This is the third of four parts.


Part 3: Technology Under the Hood

In referring to underlying technology, I do not generally mean the hardware and software choices. Microsoft, Sun, IBM, and Open Source solutions are all up to the task of delivering web content. Well, OK, we all have our opinions about which vendor’s technology works best in certain situations, but let’s leave that aside. The critical issues are in the installation, setup and the environment your systems live in. It must be done right to reliably perform the required tasks with speed and almost 100% uptime.


Is Your Pipe Big Enough?

Perhaps the biggest issue is bandwidth. Bandwidth in the networking sense refers to the data rate or channel capacity of the local area network (LAN) or Internet connection. In our discussion, we are more concerned with the Internet connection capacity or “pipe” size than LAN capacity. If you connect the best Internet server in the world to an insufficient pipe, the user experience will be degraded by the speed limit of the pipe. As an analogy, assume we have five gallons of fuel to dump into our race car in the pit. If we pour that fuel through a soda straw, it will take a while. Your Internet Service Provider establishes bandwidth for your Internet connection. However, there are other ways to expose your web site server(s) to the Internet.

Besides putting a web server for your public site on your own business network, you can utilize third party web server “hosting” in several ways. All third party web hosting solutions involve sharing physical resources and bandwidth in one of a couple of ways. The first, and least expensive, is called virtual hosting. In this setup, you share a server with other small accounts. You can also have one or more dedicated servers in a host’s data center. You can own your own servers (called co-locating) or use servers provided by the host. Third party hosting usually has the advantage of big bandwidth capacity, secure buildings, redundant cooling and electrical supplies, and redundant Internet backbone connections. The Internet backbone refers to the main “trunk” lines of the Internet owned and operated by the major communications companies and the government. If you host in your own shop and your router connecting to the Internet fails, your site is off line until that router is fixed. At a commercial data center, that router is one of several connections to various “trunks” and requests for your server would be automatically re-routed to a working connection. Nice!

A disadvantage of co-located servers is their relative inaccessibility for maintenance or changes that require physical proximity. However, most hosts will perform disk replacements or other minor hardware maintenance for a reasonable fee. The biggest inconveniences are during the initial installation and when any major upgrades are required.

The primary message concerning bandwidth is do not put a public server on your local office’s DSL or cable connection if you expect any significant traffic. The pipe is too small and it may even be a violation of your ISP’s terms of service or use.


Are You Being Served?

The next factor to consider is the equipment used for your web server(s). Surprisingly, a server that functions quite admirably does not have to have the speed and storage capacity of a desktop used for Microsoft Office applications. It does, however, need some failsafe provisions your favorite desktop can live without. Why? Because that server is working 24/7 and it does not have a human connecting with it daily to discover any problems. While monitoring helps (see below), redundancy is very important.

The first area of redundancy is in the power supply. The 115-120 volt wall socket AC power coming into a computer is transformed into 12 and 5 volt DC power to supply the computer’s internal components. The input source and the associated electronics and transformers are a hardware unit collectively called the “power supply”. Computers with two power supplies than can switch automatically to the other when one fails are preferred. These “switching power supplies” are inexpensive, so be sure you have them. Next, is to have disk drives that are redundant and can be replaced while the machine is still operating (“hot-swapped”). Some form of RAID (Redundant Arrays of Independent Disks) is appropriate. See the RAID article on Wikipedia for more information on RAID options.

Optimization of your server configuration is important for performance and ease of administration. Each server software system --- Apache, Solaris, Microsoft IIS, etc. --- has its own best practices and performance tradeoffs. Adding JAVA with Tomcat or other JAVA servers, ASP, PHP or .Net technologies can complicate optimizations substantially. Look for documents on the ‘Net that offer guidance for server setup and optimization.
For example, a white paper on Apache optimization can be found on Serverwatch.com. Make sure you understand enough to ask your technology providers some questions to be sure they have considered performance optimization.

Monitoring of your server (or web site) just to be sure it is available and performing acceptably is important. You don’t want customers or your boss calling you to tell you the site it not operational! Use of monitoring services that alert you by cell or pager of any performance issue or outage can provide you with an early warning of any problem. Monitoring service costs start at free to $50 annually for one URL with simple services and reporting and range up to tens of thousands of dollars per year for complex monitoring and reporting for many URLs.

The message: take steps to keep your site available and performing at its best while monitoring 24/7 for any potential problems.


Is Your Load Too Heavy?

How do you know if a server (or group of servers) is handling its load OK or if it is overloaded? The most obvious answer is “response time”. What is response time? Simply stated, it is the time between the visitor’s request for a page and when that page is displayed in his/her browser. The industry standard goal is 1-3 seconds for a visitor with a broadband (cable or DSL) connection to the Internet, depending upon who you talk to. I say 2-3 seconds is very good, with peak traffic responses of up to 5-10 seconds, as long as that performance level is delivered consistently from hour-to-hour and day-to-day. Most monitoring services have some way to sample response time for a URL (specific page).

The infrastructure for multiple servers responding to requests for the URLs for one site can get complex. The individual servers can be connected to a single load balancer that directs requests to the least busy server, each server in turn, or to different server groups in different locations using some logic for the distribution of requests. While beyond the scope of this article, sites receiving hundreds of thousands of visitors per week need this kind of architecture. However, when ever your requirements for a single site exceed the capacity of a single server, some kind of “traffic cop” --- hardware or software --- is needed.

A final note on performance relates to images. Images are the most important selling tool on the Internet yet they are the most resource intensive components of any site. Images are typically very large compared to a text area of the same size. We have all had the experience of visiting a web page and waiting, and waiting, and waiting for some image to display. This is usually because the image has not been optimized for the web. It is stored on the server with the same resolution as the camera that took the original photo. These days, that can be a million bytes or more. Re-sizing a picture close to the size of the screen real estate it will use and saving it as a JPG or GIF file type will usually mitigate the issue by cutting the size to under 10,000 bytes. The quality will still be good for the web at the expected display size; but the photo will not scale to a larger image very well. This may create an issue when a prospective buyer wants to zoom in for a closer look. That’s why you see small images on most web pages that can be clicked to view a bigger version of the image. That bigger image is in fact a different image file on the computer that is larger in size and “byte count”. However, since it loads alone, the larger size is less bothersome than when you are trying to display a page with several pictures and text.

Again, a simple message: make sure you optimize photos and that you have enough server horsepower to keep page load times in the 2-3 second range for pages which are viewed most often.


Technical Excellence is in the Details

As this article shows, the server engine is dependent on many factors that determine the quality of service your visitor experiences. If I had to offer one single piece of advice to the novice it would be to use a third party resource for hosting. That will all but eliminate the bandwidth issue and help with many more of the technology issues that could take your site offline more than the .02% industry standard goal about 9 minutes per month.