Security Concerns for WWW Annotation Mechanisms

Abstract
Web browsers have supported private annotations in the past, and even group and public annotations, though scalability concerns forced reconsideration. Now, few browsers support even private annotations, so how can we reintroduce annotation support in existing browsers? One of the main concerns for deploying an annotation mechanism is security and privacy. Proxies, Java, and JavaScript offer some hope, though each have difficiencies. I'll discuss how I plan to support annotations in the HyperNews software.

Contents

A document fetched via a web browser may have additional information associated with it in the form of annotations. The annotations might be metadata, ratings, votes, messages, editing markup, or generally anything that might be associated with the document. One of the more troublesome aspects of implementing annotations is deploying some mechanism for retrieving and applying annotations on the WWW. This assumes that we want to deploy annotations for the WWW of course. The problem of retrieving annotations is the focus of this paper. There are also many problems with how to apply the annotations, but these are not addressed here.

A significant aspect of annotations is that they are managed independently from the document they are associated with. That is, the annotations may come from servers that are independent from the document server. Before or after retrieving a document from the document server, the browser should also retrieve associated annotations at other servers. The annotation servers must be given the identity of the the document, as a URL, and therein lies the main challenge for retrieving annotations. A mechanism that tells the annotation servers which documents you are visiting could be considered a privacy violation unless it is done with your permission. This same mechanism could be abused by a snooper who watches what you read on the web.

In addition to merely accessing annotations associated with a document, and applying the annotations to the document, we would ideally want to manage some persistent state about the annotations and the annotation servers. For example, it may be useful to remember which annotations have already been viewed so we can simply request the new annotations not yet viewed. Also, for each annotation server, we may want to store information about the set of documents for which it has annotations so we can avoid asking for annotations that do not exist. These persistent storage capabilities are another area of security concern since they require that the annotation mechanism be able to read and write this data, probably to local disk.

Other security issues are the same as many current security issues for the web. Specifically, authentication and access control of annotation services can be done the same way as for existing services. Annotations are merely another kind of document from this perspective. [Client-Security]

Related Work

Similar privacy concerns are raised about HTTP Cookies [Client-Security] which, in some cases, may be used by a server to track your path through its documents, but in this case, the knowledge of where you have been is limited to the server itself. There is a desire by document providers to transfer this knowledge between cooperating servers as well, and this raises more privacy concerns.

The PICS [PICS]system is intended to support ratings of documents on the web. This is a specific kind of annotation with the semantics of filtering the appearance or availability of documents based on the rating values. Browser vendors are building in support for PICS [PICS clients], but due to the limited semantics, the same mechanism will not be available for use with other kinds of annotations without further extensions. One extension to PICS, called PICS-SE [PICS-SE], will support more general types of annotations and associated semantics.

The XML [XML] effort is to define a cleaned-up HTML-like markup language that has the ability to be extended in a more uniform manner. In particular, the semantics of each tag can be defined in an extensible manner, such as with Java code. This will be useful to define the data transfer format for arbitrary types of annotations along with their associated semantics. What we need is the combination of the ability to fetch associated annotations as supported by the PICS-capabile browsers with the extensible semantics of XML.

Workarounds

Java applets are downloaded from remote servers, and generally only allowed to access services on the same machine that the applet came from [Applet-Security]. Java library code that is used by an applet may, however, access any servers because it is assumed that if you installed the library code, then it is safe. This provides a possible workaround for the privacy problem. However it requires that users install a modified library which they must trust to do what it says.

Plugins are in a similar situation to Java library code. They can do anything, but it is difficult to get users to install and trust the code. Furthermore, plugins tend to be browser specific rather than platform independent. ActiveX code is similar [Client-Security].

Signing the code, whether Java applets or ActiveX modules, will help with knowing who has violated your privacy once it has been violated, if you find out that it has been violated. This is little consolation and thus I expect it will not be satisfactory protection.

JavaScript code that is contained in one document may access aspects of the contents of another document viewed in another window, including the location, embedded links, form fields, etc. There are several restrictions on what the JavaScript program can do, however. First, and foremost, the JavaScript contained in document A from server-A cannot access any of the contents of document B from server-B unless the user has enabled tainting. In that case, all the data retrieved from B is "tainted" and the JavaScript execution environment will not let any such tainted data leave the browser without the user's permission. In other words, the URL of document B cannot be sent to an annotation server to retrieve associated annotations unless that server is the same as server-B. Document B could explicitly *allow* such exporting of data extracted from B by untainting its data, but this is not sufficient to support annotations of any document regardless of its origin or degree of cooperation.

One possible future workaround for JavaScript is that if the JavaScript code is in a document retrieved from the filesystem local to the user, then this could be considered to be safe, like the case of local Java libraries. Saving a document with embedded JavaScript to the local file system is considerably easier than installing a Java library, but perhaps it is too easy. If the JavaScript were found in a specially designated directory on the user's filesystem, then this might be sufficiently safe.

Proxy server can be used to fetch documents for you, as well as fetching the associated annotations, modify the returned document to append annotations or modify the content (depending on the nature of the annotation). Proxies are therefore and excellent solution to the problem except for some difficulties. First, users may not have the choice of installing a proxy, and even if they do, actually doing the installation and maintenance is extra work that seems to stop most people. If there is a proxy at all, it is probably used to get over a firewall or to do caching. And that is a second difficulty. Multiple proxies do not play well together yet. A chain of proxies is possible, but it slows down the process and compounds the complexity of dealing with different versions of the HTTP protocol, etc.

HyperNews

For HyperNews, we are currently building JavaScript code that will be used to implement client-side rendering of HyperNews forums and messages. The JavaScript code lives in one frame of a window that also provides controls for manipulating messages. The messages that are displayed in the other frame are retrieved from a HyperNews server just as they are now - each message has its own URL and the server, implemented as a set of CGI programs, generates HTML to represent the message. But for the new client-side rendering mechanism, the HyperNews server will return an HTML document containing JavaScript code that first constructs a message object and then calls the rendering code contained in the other frame. The rendering code can be changed dynamically and the message frame will be rerendered with the new code. The rendering code can come from any server, not just the HyperNews server because, in this case, the message document is cooperating by calling the display code and providing it with all the data it needs.

HyperNews is currently usable as an annotation server. A special script is provided that a browser may make requests to via normal HTTP used in a special way [Annotation-Protocol]. The request can take the form of either a GET request or POST request. In either case, parameters must be supplied that identify the URL of the document of interest, the date range of annotations, the desired format, etc.

The use of HyperNews as an annotation server depends on support in a browser for accessing the annotations. Such support was explicitly added to released versions of NCSA Mosaic for X and also to a non-public version of the HotJava browser.

Conclusions

What we desire is an annotation mechanism that is either built-in to popular browsers (unlikely at this stage, except for PICS ratings) or an extensible mechanism for adding annotation support to existing browsers. Such an extensible mechanism might be useable for many things besides just annotations. In fact, the ability to support annotations in a general enough way seems to require features that would be usable for many other things. Consider that the annotation mechanism must be able to watch every document you see, fetch any type of associated information from any server telling it what you have seen, control and modify the display of the documents, and store data and read it from local disk. An extensible mechanism to do all that is a tall order. We will be doing well simply to fetch associated annotations that are displayed independently from documents.

References


Daniel LaLiberte (liberte@ncsa.uiuc.edu)
Last modified: Thu Jun 5 12:24:23 CDT 1997