Cygnus - CSK collaboration: Evaluation of Web browsers

Cygnus Support, February 28, 1996
(Gordon Irlam, gordoni@cygnus.com)


As part of the Cygnus - CSK collaborative project Cygnus agreed to evaluate a number of web browsers and select one that can be used for demonstration purposes.

We are interested in both the features different web browsers provide, and the suitability of different browsers for our own purposes.

Overview of Available Web Browsers

Many different web browsers exist. Netscape is by far the most popular. The following table offers rough estimates of the market share of the different web browsers.

The Netscape browser currently has 70-80% market share. It supports both standard HTML, and a number of proprietary extensions. The latest version, Netscape 2.0 is in beta and includes support for the Java programming language. The deployment of Java enabled browsers is likely to take some time, and currently only perhaps 5-10% of the browsers out there are Java capable. Netscape provides support for SSL and S-HTTP based security. Netscape has a "plug-in" mechanism that allows other software applications to interface to it. Netscape also intends to provide support for a scripting language originally called LiveScript, and now call JavaScript. Netscape makes their executable software available to customers for free trial use.

Licensing rights to the Mosaic browser produced by NCSA were sold to a Spyglasss, Spry, GNN, Quarterdeck and a number of companies who now produce commercial versions of Mosaic. Chimera is another Mosaic derivative. NCSA is continuing to develop Mosaic, and the Windows version of Mosaic is quite a bit more sophisticated than the Unix version. Mosaic binaries are freely available, but access to the Mosaic sources is limited.

AOL, Compuserve, and Prodigy all provide web browsers to their customers. It is difficult to know how heavily used these browsers are because typically these organizations cache accesses to pages on the Internet internally. Most of these browsers are reported to be fairly low quality.

Microsoft's Internet Explorer is a high quality browser, that like Netscape includes a number of proprietary HTML extensions. Microsoft recently licensed Java from Sun, and future versions of Internet Explorer are likely to include Java support. In terms of functionality Internet Explorer is probably the closest competitor to Netscape. Microsoft does not give away its source code.

I don't know very much about IBM's WebExplorer. The statistics gathered show it has perhaps a 1% market share.

The Lynx line mode browser is only used to access the web by people connected to the Internet via character based terminals, and lacks graphics capability.

Sun's HotJava browser is written in Java. It is reasonably good, but it lacks the features of Netscape. Source code to HotJava is available, but the HotJava licensing terms limit its redistribution. HotJava supports Java applets.

OmniWeb is supposedly a reasonable browser, but it is only available on the NeXT platform.

Arena was developed by the World Wide Web Consortium as a test bed for the development of HTML 3.0. As might be expected it provides very good HTML support. Unfortunately, it currently scores poorly as far as features intended to enhance usability are concerned. It could be jazzed up into something very nice. The source code to Arena was recently released.

Grail is a browser developed by CNRI written in the Python programming language. Grail is quite good, and it includes support for Grail applets. Dancer is a second Python based browser.

Emacs w3-mode is written in Emacs Lisp and is intended for people that use the Emacs text editor. It appears to run slowly and it has a weak GUI interface.

SurfIt!, tkWWW, Hippo, and Phoenix are all web browsers written in Tcl/Tk. Of these SurfIt! appears by far the most sophisticated and supports TCL based applets.

MMM is a CAML/Tk based web browser than supports CAML applets.

ViolaWWW was a powerful Web browser with a scripting language. It was poorly written, but it pioneered the way for browsers that came after it. It is no longer widely used.

Two other free browsers are Cello, and UdiWWW. Neither of these includes source, so only a little time has been spent looking at them. Other browsers we know about, but have not evaluated include WWWeasel, written in Lisp(?), and CandleWeb, a browser that supports applets in the Awe language.

Anatomy of a Browser

Most browsers have a similar structure. They comprise: Typically this adds up to somewhere between 30k and 70k lines of code. Sun's HotJava browser is an exception. It only requires a few thousand lines of code because support for retrieving documents based on URLs, parsing documents, and displaying them, is all more or less included as part of the standard Java classes.

By far the most complex task in writing a Web browser is the user interface.

A lot of browsers use the same library of code for some of the above components.

The W3 common code library is widely used for retrieving documents based on URL. It was developed by W3C. It contains a lot of seemingly bogus code, but is extremely portable. It was previously known as the CERN library.

For displaying images a lot of browsers use xloadimage, which is X11 contrib software, or XLI, which is a derivative of xloadimage. Unisys holds a patent on data compression that they allege is infringed by the gif file format. As a result of this there has been a move away from the using of gif. The Independent JPEG Group has code for loading JPEG images (exclusive of an optional encoding format patented by IBM). And their is a library of code for the new PNG graphics standard that Compuserve is in the process of endorsing as the successor to gif.

Analysis

We tracked down and attempted to review 15 of the browsers previously mentioned to see which would be most suitable for the use as a demo browser in the Cygnus - CSK project. All had some problems, and no one browser is ideal.

A document containing notes from the detailed analysis of each of these browsers, along with pointers to further information on each browser is available online as:

http://webhackers.cygnus.com/webhackers/web/software/browsers.html
For reasons of brevity the contents of the above document will not be repeated here.

From this detailed analysis, the browsers that appeared most suitable were:

Grail is a fairly nice browser. It currently only runs on Unix, but because it is written in Python it should be possible to port to Windows quite easily. Grail supports HTML 2.0, and allows the execution of Python applets. The Grail browser can be extended via downloadable modules. Unfortunately Grail sources are not readily available, and it seems unlikely they will ever become very widely available. While it might be possible for us to obtain the Grail sources for this project, doing so is unlikely to yield any long term benefits. It also isn't clear how easy it would be to support multibyte characters in Grail.

Only the Unix sources for Mosaic are available. The Windows sources are not widely available. The X sources would be sufficient for a demo browser, but they are quite a bit out of date with respect to the latest Windows sources. The Mosaic sources are also fairly large and cumbersome, and are not freely available. Using Mosaic would require us to use Motif, which could also potentially complicate things. We do have the option of using Mosaic as a demo browser, but it seems unlikely we would gain anything useful from the process.

HotJava is quite good. Unfortunately the code base currently appears rather fragile, and so using it is a slightly risky proposition. Sun released a version of HotJava that works with the Alpha version of the Java Developers Kit. Since then a new version of the developers kit has been released the new JDK that is compatible with the version of Java in Netscape, but incompatible with the version used by HotJava. Internally Sun has a version of HotJava running under the Beta JDK, but it has not yet been released. What might be required to support multibyte strings in HotJava hasn't been examined.

Arena offers a solid implementation of the HTML 3.0 standard. Arena is written in C and makes use of Xlib to do graphics. Unfortunately the Arena sources are not of a very high quality. Arena is not portable to platforms other than Unix. It should be quite easy to modify Arena to render multibyte strings provided suitable fonts are available.

The SurfIt! sources are currently fairly fragile. SurfIt! will only work with a particular version of Tcl and Tk, and it requires changes to Tcl and 3 additional software packages. In time SurfIt! might evolve into a really good browser, but right now using SurfIt! is not without some risk. SurfIt! has the advantages of already supporting Tcl applets. There exist a set of patches to Tcl/Tk for dealing with Kanji text. One of the main advantages SurfIt! has to offer is it would make it almost trivial to port the resulting system to windows. The Tk windowing system is also well tried, tested, and robust. Modifying SurfIt! might be a useful experience since we will then be in a better position to assess its suitability for other projects. Some of the other options would limit what we could do with the resulting code.

Conclusion

For the above reasons SurfIt! is recommended as the browser to be used for demonstration purposes in the CSK project.