Article - Issue 8, May 2001
Professor Nigel Shadbolt
In order to manage knowledge effectively we need to develop new technologies: tools to capture knowledge and organise it, reliable storage systems that allow the knowledge to be kept up to date, and effective ways of disseminating knowledge to those who need it. Nigel Shadbolt gives an overview of the technologies currently on offer and those that are being developed for the future.
In a recent article for Ingenia, Khurshid Ahmad reviewed the increasingly important role that the management of knowledge is assuming. In what follows I will describe some of the technologies that are going to be used to organise and exploit these knowledge assets.
We will define knowledge as useable information. Useable information can be matched with, and brought to bear upon, the particular problems your business or organisation is addressing. To turn your information into knowledge, it is necessary to understand the connections between it and your business processes. It is necessary to understand what information to use when, how to find it, and how to present it to the relevant people.
We live in a world in which information overload is a common experience. John Naisbitt memorably observed: ‘we are drowning in information and starving for knowledge’.
Whatever brand of knowledge management a company adopts, there are six basic challenges that arise if we want to turn the torrent of information into useful knowledge.
The first challenge is to capture knowledge content. Knowledge acquisition includes making tacit knowledge explicit, identifying gaps in the knowledge already held, acquiring and integrating knowledge from multiple sources, and condensing and distilling knowledge out of the Gigabytes of data and information available.
The second challenge is to model, organise and structure knowledge. Knowledge structures must be sufficiently expressive to enable people and machines to problem-solve with them.
Once knowledge has been acquired and modelled, it will be stored in a repository; such repositories will become very large and in many cases will be distributed. Retrieving a piece of knowledge becomes problematic.
A fourth challenge is to maximise the opportunity for reuse of knowledge. Too often we acquire problem-solving experience or domain content and then promptly forget about it or mislay it. This leads to cycles of organisational learning followed by bouts of corporate amnesia.
Having acquired knowledge and stored it in a well-indexed repository, the issue then arises of how to get that knowledge to the right people in the right form at the right time. Different users will require knowledge to be presented and visualised in different ways.
Finally, having acquired the knowledge and having managed to retrieve and disseminate it appropriately, the last challenge is to maintain it. Some content has considerable longevity while other knowledge dates quickly and becomes obsolescent.
Capturing knowledge and modelling it in computer systems has been the goal of expert systems research for some 25 years. A number of the tools, techniques and methods developed for expert systems are highly appropriate for use in knowledge management. For example, commercial tools are available to facilitate and support the elicitation of knowledge from human experts. One such technique, the repertory grid, helps the human expert make tacit knowledge explicit.
Expert systems research has also produced methodologies to guide the developer through the process of specifying the knowledge models that need to be built. Unilever has been using one such methodology, CommonKADS, to inform the way they build and document knowledge models. Since the development of knowledge-intensive systems is a costly and lengthy process it is important that we are able to reuse knowledge content. CommonKADS is based around the idea of building libraries of problem-solving elements and domain descriptions that can be reused.
The challenge of knowledge retrieval is being tackled using an array of techniques drawn from the fields of statistics and artificial intelligence. The explosive growth of material on the web, which currently consists of well over a billion pages, requires impressive search capabilities. In 1999 NEC research scientists published results in Nature that indicated no search engine indexed more than 16% of the total pages on the web. Search engines constantly have to play catch up, crawling over vast numbers of pages and indexing them.
Recent developments that power leading-edge search engines include making them task specific – these engines search just for specific types of content. Another development is the ability to capture information about what a user finds relevant and to tune the search engine to reflect this information. The search engine Direct Hit observes the pages that are chosen from a search and monitors how long you spend reading the page. Direct Hit claims to have compiled hundreds of millions of these ‘relevancy ratings’. Companies such as Autonomy have used techniques from artificial intelligence to build tools that recommend ‘relevant’ content from the web by classifying the content you have visited and making predictions as to what content you might be interested in. The wealth of approaches available offers companies and organisations numerous opportunities to exploit search on intranets and the Internet.
In the area of knowledge publishing we are beginning to see web-based applications that construct documents dynamically. Newsletters are constructed by e-writers who submit their copy by email; a software program then lays out the content, associates appropriate images, establishes links to previous stories, and so on. However, little work is being done on the problems of maintaining and curating large knowledge repositories.
The semantic web
There is one development that will fundamentally change our ability to construct technologies that manage knowledge on the web. This will be the emergence of what Tim Berniers-Lee calls the semantic web. The web’s content increasingly contains annotations comprising so-called tags or meta-information. The user does not normally see these annotations. However, they provide web content with context and meaning. These annotations form mark-up languages that indicate what content is about, what it relates to, how it can be used and so on. Languages such as XML and RDF allow you to define your own mark-up languages.
The advantages conferred by content that is richly marked up include more powerful search engines, better navigation of the content, and the ability for computers to process web-accessible content more flexibly. A whole raft of initiatives is now underway to exploit these developments, ranging from automatic production of web site maps to machines that can transform and exchange knowledge derived from scientific experiments.
A second important development will involve using languages like XML to encode agreed specifications that describe the important concepts, attributes, relations and rules of a domain. Such specifications are called ontologies. They act as placeholders and organising structures for knowledge content. Ontologies are finding increasing use in e-science and e-commerce applications.
The GeneOntology Consortium (http://www.geneontology.org/) uses agreed ontologies to exchange knowledge and information relating to the study of the structure, function and biology of genes. Engineering has been developing what are to all intents and purposes practical ontologies for some time. Examples include the Process Interchange Format (PIF), the STEP initiative that attempts to specify standards for the description of products, and MIT's Process Handbook. Large-scale ontologies have been developed by Cycorp in the US as part of the effort to build commonsense knowledge bases and are to be made publicly available so as to support the semantic web initiative. Firms such as Hubwoo.com and ContentEurope.com use ontologies to manage B2B exchanges and to develop and organise electronic product catalogues.
Natural Language Processing
Another area that will contribute to knowledge technologies is Natural Language Processing (NLP). Khurshid Ahmad in his Ingenia article described System Quirk, which uses text analysis and term-extraction methods to automatically extract domain dictionaries, classify and index large numbers of documents. Computer systems are becoming more expert at summarising a wide variety of text types: technical documentation, laboratory reports, newspaper stories.
Using a combination of statistically based voice-recognition techniques and dialogue grammars the German company Interprice.com (a.k.a. SemanticEdge.com) is using NLP techniques to enable customers to talk to computer systems. A popular application area is the automated travel agent. These sorts of applications work because the range of topics to be discussed and vocabulary to be used is somewhat predictable.
The ability to index and organise multimedia content will facilitate many aspects of knowledge management. NLP is not far from being able to process an audio stream automatically, identify key words and phrases and tag the extract accordingly. The indexing and tagging of images, however, remains problematic. The difficulty is that any image can contain a huge variety of potential objects and relations of interest. Moreover, it is a hard computational problem to determine whether two images are the same or are examples of the same concept.
As the semantic web becomes a reality, more and more of the knowledge assets of organisations will be held in their intranets. Knowledge retrieval from the semantic web will harness not only the power of statistical search engines but of ‘reasoning’ services. These hybrid services will be able to infer implied properties and relationships between web content. A great deal of this sort of exploration of the web will be undertaken by software agents that will be primed to achieve specific tasks.
We can already see simple versions of these sorts of software agents. E-bay allows you to set up proxy agents – programs that bid on your behalf in online auctions. They currently follow simple rules about preferences when increasing bids and limits. Books.com uses a price-comparison agent that dynamically undercuts the price offered by its chief competitors such as Amazon. Shopbots are agents developed by companies such as dealtime.com that travel the Internet looking for the best prices and offers on various goods and services.
The use of the web and of collaborative work tools will transform the way information is distributed and disseminated. Examples are the integration of authoring and reviewing processes in on-line documents. Such environments allow structured discussions of the evolution and development of an idea, paper or design concept. The structured discussion is another annotation that can be held in perpetuity. This means that the reason for a position in a paper or a design choice is linked to the object of discussion itself.
Companies often need to revisit the choices they make. Currently this usually involves retrieving large numbers of design drawings and documents. Although these documents may together detail the choice made, the alternatives are often not discussed and the arguments for and against particular choices are not routinely recorded. The web offers a means of enriching documents with just this sort of additional information. The benefits to an organisation will include a richer corporate memory, better records of the provenance of decisions, what was decided by whom, when and for what reasons.
Personalisation of content will be an increasing feature of knowledge technologies. This requires that a model of a user’s interests and viewpoints be constructed. Currently these are somewhat simple minded – histories of the web sites a person has visited, the documents they have downloaded, the amount of time spent reading them. What these approaches to user modelling require are more detailed analyses of the tasks, goals and work structure of the user. Psychology is a discipline that attempts to understand these issues: results from this area will need to be incorporated into software if it is to become capable of personalisation.
Finally the issue of maintenance must be confronted. Some content endures – the melting points of elements and other physical constants are long lasting – but other knowledge is much more ephemeral. We need to reflect this longevity of knowledge and our confidence in it. Some knowledge if it changes has only local effects. The modification of other knowledge content has far-reaching implications.
The computational approach encourages us to hoard data. We are nervous about the delete option. The potential to retrieve information is so predominant that we forget that sometimes the smart thing to do is to decommission knowledge. We need to understand when it is legitimate to purge content. These issues are crucial in the pursuit of knowledge maintenance methodologies.
Large-scale research efforts are underway to develop the next generation of knowledge technology tools and methods. The UK EPSRC has recently funded a six-year Interdisciplinary Research Collaboration (IRC) between five UK universities. The Advanced Knowledge Technologies IRC is setting up a user forum to which it is hoped industrialists will contribute their requirements and also provide proving grounds for some of the technologies that will emerge.
The six challenges described at the outset of this article implicitly define a seventh. Anyone working or researching these themes will need to read papers in knowledge management, knowledge engineering, computer science, sociology, artificial intelligence, multimedia, linguistics, psychology, software agents, databases, and so on. The seventh ‘challenge’, then, is to encourage an appreciation of the multidisciplinary nature of the problems. In order to understand how to build effective knowledge technologies we must, in the best tradition of engineering, draw on many disciplines.
Related web sites
The Advanced Knowledge Technologies Project
Latest research on search engines
Information about the semantic web
The XML initiative
Tools for knowledge capture and modelling
CommonKADS knowledge engineering
Large-scale ontology provider
the issue arises of how to get knowledge to the right people in the right form at the right time
Search engines constantly have to play catch up, crawling over vast numbers of pages and indexing them
Computer systems are becoming more expert at summarising a wide variety of text types
We are nervous about the delete option … sometimes the smart thing to do is to decommission knowledge
Personalisation of content will be an increasing feature of knowledge technologies
Professor Nigel Shadbolt
Department of Electronics and Computer Science, University of Southampton
Nigel Shadbolt is Professor of Artificial Intelligence in the Department of Electronics and Computer Science at the University of Southampton. He is Director of the EPSRC’s Advanced Knowledge Technologies IRC – a sixyear multimillion pound programme of research into technologies to support knowledge management. He is Chairman of Epistemics Ltd, a company specialising in knowledge engineering products and services. Email: firstname.lastname@example.org