![]() | ![]() |
Vulnerabilities in extension systems
The default authentication for web pages is something called basic authentication. This is what's happening when you ask for a page, and instead of bringing up the page, your web browser brings up a standard dialog box that asks you for your username and password. There's no encryption protecting that username and password; it's just sent to the server in cleartext. Furthermore, if you ask for another page from the same server, the username and password will be sent again, without any warning, still unencrypted.
If you're going to send a web site important data, you should be sure that the site has made a legally binding commitment to protect that data appropriately. You should also be sure that you have an encrypted connection to the site.
A cookie is a fairly simple object; it's a small amount of information to be stored that is associated with an identifying string, an expiration date, and a URL pattern that indicates when the cookie should be sent with HTTP requests. Whenever you visit a web site, the browser checks to see if any unexpired cookies match the URL pattern, and if so, the browser sends them along with your request.
The information that's in a cookie isn't usually of any great interest by itself. Cookies tend to contain customer identifiers or coded lists of preferences -- things that make sense only to the site that set the cookie. This is important because cookies are passed across the network unencrypted, and you wouldn't want them to have anything dangerous in them.
On the other hand, once a web site gets a cookie, it may give back information that you do care about. For instance, it might use the cookie to look up the credit card information you used on your last order (in which case somebody with that cookie could order things on your credit card). For that matter, it might just look up your last order and display it along with your name. Since cookies are passed unencrypted, and can be intercepted at any point, it's not good practice to use them for anything critical, but some web sites do.
In addition, many people are worried about situations in which cookies can be used to track patterns of usage. When you use a link on a web page, the site you go to gets information about the page the link was on (this is called the referrer). If the site you go to also has a cookie that will identify you, it can build up a history of the places you have come from. This wouldn't be very interesting for most sites, but sites that put up banner advertisements have links on thousands of web pages and can build up a fairly accurate picture of where you come from. Since referrer information includes the entire URL, this will include information about specific search requests and, in the worst cases, may contain password and username information.
There are some controls on the use of cookies. Some browsers don't support cookies at all; others will allow you to control the situations where you give out a cookie, by choosing to reject all cookies, asking you before accepting cookies, or accepting only certain cookies. For instance, cookies are intended to be returned only to the site that set them, but the definition of "site" is unclear. The cookie specifies what hostnames it should be returned to and may give a specific hostname or a broad range of hostnames. Some browsers can be configured so that they will accept only cookies that will be returned to the host that originally set the cookie.
For example, web browsers confronted with a PDF file will ordinarily invoke Adobe Acrobat Exchange or Acrobat Reader, and web browsers confronted with a compressed file will ordinarily invoke a decompression program. The user controls (generally via a configuration file) what data types the HTTP client knows about, which programs to invoke for which data types, and what arguments to pass to those programs. If the user hasn't provided a configuration file, the HTTP client generally uses a built-in default or a systemwide default.
All of these external programs present two security concerns:
Suppose that a user uses Internet Explorer to pull down a PostScript document. Internet Explorer invokes GhostScript, and it turns out that the document has PostScript commands in it that say "delete all files in the current directory". If GhostScript executes the commands, who's to blame? You can't really expect Internet Explorer to scan the PostScript on the way through to see if it's dangerous; that's an impossible task. You can't really expect GhostScript not to do what it's told in valid PostScript code. You can't really expect your users not to download PostScript code or to scan it themselves.
Current versions of GhostScript have a safer mode they run in by default. This mode disables "dangerous" operators such as those for file input/output. But what about all the other PostScript interpreters or previewers? And what about the applications to handle all the other data types? How safe are they? Who knows?
Even if you have safe versions of these auxiliary applications, how do you keep your users from changing their configuration files to add new applications, run different applications, or pass different arguments (for example, to disable the safer mode of GhostScript) to the existing applications?
Why would a user do this? Suppose that the user found something on the web that claimed to be something really neat -- a game demo, a graphics file, a copy of the hottest new song, whatever. And suppose that this desirable something came with a note that said "Hey, before you can access this Really Cool Thing, you need to modify your browser configuration, because the standard configuration doesn't know how to deal with this thing; here's what you do. . . ." And suppose that the instructions were something like "remove the `-dSAFER' flag from the configuration for files of type PostScript"?
Would your users recognize that they were being instructed to do something dangerous? Even if they recognized it, would they do it anyway (nice, trusting people that they are)?
These extension systems are very convenient; it is often much more efficient to have the browser do some calculations itself than to have to send data to an HTTP server, have it do some calculations, and get the answer back. In addition, extension languages allow for a much more powerful and flexible interface between the browser and the full capabilities of the computer than you can get by using external viewers.
For instance, if you are filling out a form, it's annoying to have to submit the form to the server and wait for it to come back and tell you that you've omitted a required piece of information. It's preferable for your browser to be able to tell you that immediately. Similarly, if your happiness depends on having penguins dance across the screen, the most efficient way to get that effect is going to be to tell your browser how to draw a dancing penguin and where to move it.
On the other hand, filling out forms and drawing dancing penguins are not all that interesting. In order for extension languages to actually do interesting and useful tasks, they have to have more capabilities, but the more capabilities that are available, the more dangerous a language is.
Of course, normal programming languages have lots of capabilities and therefore lots of dangers, but people don't usually find this worrisome. This is because when you get a program written in a normal programming language, you generally decide that you want the program, you go out looking for it, you have some information about where it comes from, and you explicitly choose to run it. When you get a program as part of a web page, it just shows up and runs; you may be happily watching the dancing penguins and not knowing that anything else is happening.
We discuss the different approaches taken by extension languages in the following sections, as we discuss the specific languages. All of them do attempt to provide security, but none of them is problem free.
Content-aware firewalls, whether they are packet filters or proxies, can be of considerable help in reducing client vulnerability. A firewall that pays attention to content can control which extension languages and which types of files are passed through; it is even possible for it to do virus scanning on executables. Unfortunately, it's not possible to do a truly satisfactory job of protection even with a content-aware firewall.
Using content-based filtering, you have two options; you can filter out everything that might be dangerous, or you can filter out only those things you know for certain are dangerous. In the first case, you simply filter out all scripting languages; in the second case, you filter out known attacks. Be cautious of products that claim to filter out all hostile code and only hostile code. Accurately determining what code is hostile by simply looking at the code is impossible in the most specific, logical, and mathematical sense of the term. For useful scripting languages, it is equivalent to solving the Turing halting problem (determining whether an arbitrary piece of code ever stops executing), and the proof that it is impossible is one of the most famous and fundamental results in theoretical computer science.
As we mention later, content filtering is impossible on some web pages; connections made with HTTPS instead of with HTTP are encrypted, and the firewall cannot tell what is in them to do content filtering.
Starting with Internet Explorer 4.0, Microsoft introduced the concept of security zones to allow you to configure your browser to do this. Explorer defines multiple security zones and sets different default security policies for them. For instance, there is a security zone for the intranet, which by default accepts all signed ActiveX controls and asks you if you want to allow each unsigned control, and one for the Internet, which by default asks you if you want to accept each signed control and rejects all unsigned controls. (ActiveX controls and signatures are discussed later in this chapter.) There is also a security zone that applies only to data originating on the local machine (this is not supposed to include cached data that was originally loaded from the Internet). The local machine zone is the most trusted zone.
In most cases, Internet Explorer uses the host portion of the URL to determine what zone a page is in. Because the different zones have different security policies, it's important that Internet Explorer get this right. However, there have been several problems with the way that Internet Explorer does this, some of which icrosoft has fixed and some of which are not fixable. In particular, any hostname that does not contain a period is assumed to be in the intranet zone. Originally, there were various ways of referring to Internet hosts by IP address that could force any Internet host to be treated as an intranet host. These problems have been removed, and there is now no known way to write a link that will force Internet Explorer to consider it part of the intranet zone.
However, there are numerous ways for people to set themselves up so that external hosts are considered intranet hosts, and the security implications are unlikely to be clear to them. For instance, adding a domain name to the Domain Suffix Search Order in DNS properties will make all hosts in that domain parts of the intranet zone; for a less sweeping effect, any host that's present in LMHOSTS or HOSTS with a short name is also part of the intranet zone. An internal web server that will act as an intermediary and retrieve external pages will make all those pages parts of the intranet zone. The most notable class of programs that do this sort of thing are translators, like AltaVista's Babelfish (http://babelfish.altavista.com), which will translate English to French, among other options, or RinkWorks' Dialectizer (http://www.rinkworks.com/dialect), which will show you the page as if it were spoken by the cartoon character Elmer Fudd, among other options.