This article describes the concept for the Pangaea Project. The goal of that project is to define (and build a reference implementation for) a human-mediated knowledge base that provides “just in time” information — information when you need it.
The goal is to combine users (who generate queries and receive responses), technical experts (who respond to queries), writers (who are expert and organizing information and explaining things), and a knowledge base (possibly with a “librarian”) into a “one world” information system that allows people to share knowledge and solve complex problems. One name for such a project is:
- Pangaea (pan-JEE-uh) noun
- A hypothetical supercontinent that existed when all the major landmasses of the earth were joined.
(Grant Bowman came up with this name.)
This section summarizes the project and gives its background. It also describes the motivation that gives the project its compelling significance.
Summary and Overview
- The ultimate goal of the project is to define standards for, and build a reference implemention of, a human-mediated, knowledge-based, distributed collaboration system. The purpose of the system is to “tame the information explosion” and make it possible for interested parties to collaboratively solve the complex problems that confront mankind.
Another good name for the system would be Global Ontotology- based Discussion system (G.O.D.). There are probably better acronymns, but “ask G.O.D.” has a nice ring to it…
The project expects to go forward in several distinct phases:
- Phase I
- Abstract the requirements of such systems, and define interchange standards for them. Create specifications for a layered architecture, and define the APIs between the layers.
- Phase II
- As Lee Iverson has proposed, build the core content- management layer that can serve as the foundation for such systems, and begin promoting the standards it uses in order to make the next “file system” that virutally *all* applications build on.
- Phase III
- Begin building a suite of tools on that foundation (and of course, refining the APIs in the process).
The project was orignally begun as a part of Doug Engelbart’s “Unfinished Revolution” colloquium held at Stanford in the Winter of 2000. A series of weekly discussions that were carried on subsequently led to the indentification of a core group of developers who were interested in reifying some of the key concepts in concrete ways.
That group is no longer under Doug’s direction, nor is there any formal affiliation with Doug’s “Bootstrap” organization. But it is worth pointing out the deep debt that we and society owe him for motivating the project and for gathering together the bright minds necessary to bring a solution to fruition.
More than 50 years ago, Doug consciously devoted his career to “improving mankind’s ability to solve complex problems”. That motivation remains as significant today as it was then. However, in today’s world, such improvement has become a critical necessity that is urgently needed.
The world today is beset by a variety of major issues, any one of which has the potential to destabilize our civilization. Global-scale problems that arose late in the last century include the potential for nuclear war, limitations on oil and energy supplies, global hunger, and overpopulation. In addition, the rise of new low-cost technologies have created entirely new issues in the areas of nanotechnology, bio-engineering and robotics.
In short, the number of problems we face, combined with their potentially devastating consquences, leads to the conclusion that civilization as we know it will be fortune to survive until the middle of the century. (If it gets that far, the odds begin improving, because the solutions we have adopted by then may well begin leading the way out of the hole we have dug ourselves into. But our chances of reaching that point may, in fact, be rather slim.)
To solve those problems, we need better mechanisms for collaborating. Email has shown us the advantage of being to carry on remote converstations without having to have both parties connected to the same line, at the same time. But while email is good for short exchanges, it is unsuited for wide-ranging discussions in which the goal is to first identify the salient issues, and then find solutions for them.
In short then, we need better mechanisms for collaborating, better methods for sharing our knowledge and finding solutions. But the system must do even more.
The information explosion that typifies society today has increased the chances that, even if a solution is identified, it will be overlooked. Even if the right mousetrap is invented, then, the world simply may not know enough to find that door, even if it were willing to beat a path to it.
To repeat, the situation is critical. If we had a robust collaboration system — one that captured not just information, but knowledge, and allowed the knowledge to be retreived effectively — it still would not be clear that we would be able to solve all of the problems that beset us, in timely enough fashion to ensure survival.
And that is if we had such a system today. In reality, we still need to build such a system.
Again, to refer to Doug Engelbart’s realization of some 50 years ago — the problem of building such a system is itself a complex problem, and that problem is therefore the *most* important challenge we face.
he odds are long. Although humanity will most likely survive, in one form or another, the chance that our civilization will survive is, in fact, remote. But when your only chance for survival is to win the lottery, it makes sense to buy a ticket.
The project we propose is that lottery ticket.
This section identifies the project’s components, specifies their goals, and relates them to the issues and ideas that motivate them.
In complex problems, issues are neither clearly identified nor understood. The first stage in resolving such situations, therefore, is always about coming to enough understanding and agreement that a solution is possible. After that, the longer (but typically easier) process is to identify and implement solutions.
In both cases, collaboration is required. As Eugene Kim wrote:
…the way Eric framed certain points in his “Frames” paper (Darwin Solves the Frame Problem) was an excellent way of describing the importance of collaboration. In essence, we as individuals all have our own context, knowledge, set of values, and algorithms for tackling problems, but the problem is likely to blow up in our face before we can solve the problem individually. However, if several people attack the problem in parallel, and share their knowledge with each other, we have a chance of solving these problems. What we need are the tools and the methodology for collaborating and sharing.
One of the critical dimensions that distinguishes “knowledge” from “information” is the existence of *abstraction*. A wide variety of information exists and is available today. But because it is not *abstracted* in useful ways, it cannot be found when appropriate.
Information becomes potentially useful knowledge when abstractions are layered over it. Take an article on building the Aswan dam, for instance. Adding the categories: Egypt, Earth Moving, and Dam Construction to that article makes it possible to retrieve later. Or perhaps the article includes tips on working with foreign employees, exchange rates, or construction practices. The article would reasonably fall under those categories, as well.
Granularity is an issue here, too. Not *all* of the article would fall under the category “earth moving equipment”. The system must therefore make it possible to properly categorize the *parts* of the article that fall under that heading.
Improving the granularity of information accessibility, coupled with the ability to add an abstraction layer to it (as envisioned by projects like topic maps and other areas that Jack Park has pointed us to) provides the capability for defining a truly knowledge-based collaboration system.
Finally, the lattice of categories (abstractions) and the relationships between them is known as an “ontology”. The existence of an ontology makes it possible to find relevant information that is not immediately categorized as such.
For example, the fact that “Egyptian food” is a topic that is related to “Egypt” makes it possible to find out more on the subject from the author of the Aswan Dam article. Without an ontology, such relationships could never be discovered.
Now, a knowledge-based system is only as useful as it possible to get things *out* of it. The rule is this: If you can’t find it, it was never there.
However, how can you ask a system to tell you about something you need to know when you don’t know the right question to ask? How can you determine the right question to ask without knowing what is in the system and how it is categorized?
Those are the problems that have bedeviled knowledge-management systems. And in real, practical terms, they are computationally unsolvable.
In other words, we can’t get the computer to figure out what we need based on what we ask for. Computer-based natural language processing systems simply are not up to it.
At the same time, information that is present in the system may not have been categorized in a way that makes it accessible. We can’t solve that problem computationally, either.
What we can do is provide for human-mediated interactions. The interaction could go something like this:
- Human queries the system.
- System sends back response(s).
- Human sends back “ok. thanks”. (end here)
- Human sends back “huh? didn’t help”
- The question and generated response(s) go out to a list of folks who have registered interest in that subject area (for example, technical support personnel).
- Those folks can:
- Add synonyms to the system that let the system generate the appropriate responses from the question, as given.
- Translate the question into more rigorous form, thereby educating the user as to query language.
- Recognize on the basis of their familiarity with the content that some document already contains useful information, and categorize the information so that it is found.
- Perform some combination of (a), (b), and (c), or…
- Generate new content.
The initial draft of generated content should go to the user, but it should also flow to a writer who maintains information in the knowledge base. The point is not only to wordsmith the existing information, but to recognize when similar material exists and refactor things for maximum educational value — and do the wordsmithing to express thoughts clearly, with pointers to background information, where needed. These are processes that writers become proficient at. They’re skills that are shared by a relatively small subset of designers and technical support people.
Pointers to the revised, expanded, condensed, and/or refactored material should then be automatically sent to the interested support people and to users who have registered interest in this subject, including the person who made the original query.
As people interact with a system that operates in this manner, the system gets smarter. The users get smarter, as well — in a natural, organic fashion. This strategy allows the system ontology to grow over time, rather than attempting to be correct (and complete!) at the outset.
In any collaborative system, attribution is essential. All items of information in the system must be attributed to their authors, for several reasons:
- To encourage contribution.
- To find the people with the expertise you need.
- To initiate a deeper dialogue when appropriate.
- To distinguish the different “voices” in a collaborative conversation
And since the addressable information must be fine grained, down to the paragraph level, the attribution must be equally fine-grained.
Today, we experience a huge “information glut” that overwhelms our capabilities. We can’t even read all the information we’d like to, much less digest it and come to any decisions concerning it.
That “information explosion” is one of the significant factors that impedes our ability to solve the complex problems before us in the critical time-frames that must be met.
As Jim Hurd has suggested, Taming the information explosion is therefore a major goal of the proposed system. Abstracting information into knowledge is one of the important mechanisms for doing that. The other important mechanism is ratings.
Like books at Amazon.com, the existence of ratings will make it possible to find the really good things more quickly, by putting them at the head of the list. Of course, ratings need categories, too, so it becomes possible to find the “most readable” material, as distinct from “most complete” or “most authoritative” resources.
While versioning is desirable in a single-user setting. It is an absolute necessity in a collaborative system. Not only must prior versions be available, they must be readily accessible. Only in that way can information be reorganized and summarized, while retaining the ability to reverse the changes, or identify improvements by comparing them with the originals.
No single, global information base will ever suffice. The world is simply too large, and there are too many ways to abstract information.
To effectively solve problems, then, a group of concerned individuals will need to share a knowledge repository that is essentially distinct from other such repositories.
Within that repository, a group will therefore be free to define the ontology that makes the most sense for the problem they are solving — and to change it as they go along.
The existence of different repositories makes possible a “separation of concerns”. For example, a marketing group defines a “program” in one way, while developers define it in a completely different way. The fact that systems are discrete makes it possible to define an ontology that uses the word “program” in the most natural possible way, without having to construct a global ontology that is good for all people everywhere (and which, as a consequence, has extremely long names for things!)
The existence of multiple repositories will also allow individuals to be a member of multiple groups. For example, as a member of a professional peer group, it would be possible to access information about best practices in that profession. But as the member of an organizational group (a department, for example) it would be possible to access information about the standards and practices used in that organization. The answers could well be different, and the organization’s practice could constitute a competitive advantage, so discrete information domains are a necessity.
Discrete repositories also allow individuals to achieve some sense of mastery, by limiting the material in a particular domain.
Finally, discrete systems provide security by virtue of their redundancy. For example, copies of an email message can exist in so many locations that total deletion is next to impossible. In the same way, discrete repositories provide security as useful information objects are imported into various and sundry systems.
When information is imported into the system, it must be categorized according to the ontology used by the group that is using the system.
Having information categorized in different ways obviously has a balkanizing effect. Since each group uses a somewhat different ontology, one group cannot access another groups resources.
The answer is automated ontology translation. The ability to map concepts (which is no mean feat) makes it possible to de-babelize the queries and de-balkanize the repositories so the knowledge can be shared.
One interesting corollary of this observation is that notification of ontology-changes must occur. In other words, every system that has defined an ontology translation must be informed when an ontology changes. To the degree that it is possible, the translation mechanisms that have been defined must be automatically modified, as well.
The resulting interchange standards will make it possible for divergent groups to share and reuse information effectively.
The goal is to define the standard and create a reference implementation for a human-mediated, knowledge-based distributed collaboration system.