Collaborative Filtering

We're moving from a mass market age to a made-to-individual-taste age: witness the explosion in specialty magazines over the last decade. Inexpensive desktop publishing programs lowered the barrier to entry so that production houses can be supported by much smaller readerships. As with so many trends, the Web has greatly accelerated this process. The barriers to entry are practically nonexistent; personal web pages are the ultimate in specialty magazines.

But so far, the web has only accelerated an existing process. Personal magazines on the web are just more of the same. We're in for a much larger change, a true paradigm shift, due to the basic difference between the web and a magazine: the web is interactive.

The web's interactivity allows it to adjust to an individual over time. There are simple examples such as systems with which one registers preferences, but I anticipate many more sophisticated examples. The best example so far is a prototype collaboritive filterer called firefly used for music and movie selection.

This system asks one for the names of some preferred albums, then does a sparse match against the entries for all other individuals in order to come up with further selections that fit one's tastes. In effect, it's using every person who's interacted with it as a music critic, in order to give advice tuned more precisely to one's personal tastes than could any noninteractive database.

This same technology can be used for any sort of search service. It could help one find interesting web pages, books, movies, paint colors, paintings, whatever. The oft-mentioned personal newspaper could well be based on this sort of item-identification technology.

This parallels a genetic algorithm idea, namely that the computer presents choices (species in GA), the user selects the preferred ones, and the computer iterates using that further information. In other words, design by multiple selection. Collaborative technology uses other people's inputs to help refine the system's selections at each stage.

One can imagine software tools (for houses, papers, whatever) that get better and better at guessing what one wants to do next, even when the user doesn't know him/herself! The tool knows because many other people have been this way before, done this same action, so there's no need for you to have to repeat all that. An extension of the idea of a single-user tool that detects and assists with repetitive acts could be one that detects and assists with interuser repetitive acts, acts that the user has him/herself never repeated. It's possible that more like 90% of one's actions could be predicted by the machine rather than the meager 10% hit rate that a single-user repetitive-action detector can achieve.

The premier example is the web browser, used for searching through huge volumes of information, looking for tidbits of interest. Suppose all browsers shared their learnings about user sessions (ignoring the privacy violation issues). It is conceivable that after a few minutes of browsing the system would be able to take you directly to the end results that other people have found interesting (judged by time at a particular site) when doing similar searches.

 
 
Substantive changes:
    March 21, 1996: created.
Copyright © 1996, Steve Colwell, All Rights Reserved
Home page