Speculations on what a knowledge repository might consist of, and how it might be used in the context of software design — especially in an online design discussion.
Originally published 2000
Contents
Keywords (partial ordering)
[abstraction sequences, complex abstractions, simple abstractions],
[CKC, content, context, design discussions, granularity, IBIS, knowledge, knowledge nuggets, knowledge products, nuggets],
[concrete ratings, ratings, situational ratings]
Introduction
This piece considers knowledge repositories — how they might actually work, and how they might interact with a mechanism for carrying on an online design discussion. To keep things managable, let’s focus on a concrete issue in system development: Implementing a linked list. (The same kinds of thinking undoubtedly applies in other areas, but focusing on software development has the greatest “bootstrap” effect.)
Note:
Lee Iverson has made a point of distinguishing the “content” (documents), “knowledge”, and “context” aspects of a respository-based system. This paper represents insights gained from investigating what really resides in each of those bubbles (or, possibly, layers) — especially in the “knowledge” arena.
In the knowledge layer, the subject “Linked List” has several useful subcomponents:
- Linked List
–what is a…
–how does one look…
–how does one use…
–how good is…
–how does one implement…(verbal description)
(diagrams, animations)
(examples, explanations)
(situational ratings)
(recipe or template)Notes:
- Of course, there are actually multiple different kinds of linked lists: singly-linked, doubly-linked, straight lists, circular lists, null-pointer terminated, and null-value or special-end-node terminated. But for now, let’s keep things simple and assume that “linked list” means a singly-linked list…
- In a recent conversation, Art Freidman reminded me that there many possible choices for the API to such a list, as well. I’m just going to pretend I didn’t hear that…
- In the Knowledge Technologies conference I recently attended, it became clear that Topic Maps constitute an ideal vehicle for linking to and identifying the many useful components of a “topic”. (It is not sufficient to merely link to a thing. It is also necessary to know in advance what kind of thing the link points to, so you can select the links you want to follow. Topic Maps fill that role admirably.)
The first three are the kind of things you would expect in a good tutorial. They represent the first thing you would go to if you got the answer, “What you need is a linked list” in response to a question you asked on some interactive forum. In fact, they might well simply point to a section in one of Donald Knuth’s books, for the clearest possible illustrations and explanations.
The last two subcomponents, situational ratings and templates, have some interesting characteristics and implications that are worth exploring.
Timeout: What is Knowledge?“Knowledge” in such a system can take several forms. The following list probably is not comprehensive, but it’s enough to get started:
Returning now to your regularly shceduled programming, we’ll get back to the topics of situational ratings and take a deeper look at templates… |
Situational Ratings
At the knowledge layer, ratings are necessarily provisional. Thus, a linked list is good if you can afford the extra space, don’t mind a little extra overhead when you’re looping, and you need to do a lot of inserts and deletes. On the other hand, for a fixed list, an array is typically going to carry less overhead. (But that evaluation, too, is situational — in Lisp, a list is the *only* way to go.) But if at the knowlege layer ratings are provisional, for any specific project they are fairly concrete. One might argue, based on the characteristics of the project one is working on, that a linked list is appropriate. To give that argument some weight, one would reference the “knowledge nugget” that was stored on the subject of Linked Lists.
In other words, the design discussion (presumably an IBIS-style discussion carried on within the scope of the repository) defines the *context* within which the knowledge is used. (The knowledge, meanwhile, rests on a foundation of content. More on that in a bit.) So, in a design discussion, it would be possible to say, “I think we should use a linked list” and cite as rationale the user specs that say items will constantly be added and deleted, along with the “knowledge nugget” that gives Linked Lists a high rating for such purposes. The recipient of your wisdom (who may never have heard the term), can then get a tutorial on the subject from the knowledge base.)
Recipes & Templates
Now consider a “recipe” as a collection of steps, or an ordered set. The recipe, or template, for a linked list algorithm will (to qualify as knowledge) be very abstract. If an actual implementation is available, for example, in legacy assembler code, then the recipe might well contain a link to that implementation. But the recipe itself would like something like the implementation comments extracted from the source code.
- Note:
Initial stabs at the recipe are liable to be very language specific. So a step might read “use the address stored in the Next variable to access the next item”. However, better and more useful abstractions will be more absract and less specific, e.g. “visit the next item”. That kind of generality is hard to get right one the first try. So the system should make it possible to refine the generalizations as more implementations are “covered” by the template.
Now, a full implementation of a concept like Linked List is great, if one exists. The question, “How do I implement a linked list in language X?” can be answered with a pointer to the implementation. But what if you are working on a project in a new language? (As we’ll see, idioms hold the key.)
Idioms and Granularity
It now makes sense to introduce the concept of an “idiom”. An *idiom* is the syntactic mechanism for achieving a task in a specific language. For example, the “loop idiom” breaks down into for-loops, while-loops, and until-loops, each with situational ratings. For each loop, the specific syntax used in the C language constitutes the idiom for that concept in that language.
- Note:
A primitive idiom is a simple abstraction that has a one-to-one mapping with the language. For example, the idiom for assigning a value to a variable is a single-step process in most any procedural language. A complex idiom may have multiple steps, like a recipe — but the steps are language specific. (The abstract template for those steps may well exist as a knowledge nugget, but the language-specific steps constitute an idiom.)
Now, given a template T, consisting of an ordered set {s1, s2, …} of steps and I, a collection of idioms that can be expressed in the langauge {i1, i2, …} the expression
- T x I == {s1, s2, …} x {i1, i2, …}
produces a *knowledge product* — literally, the product of two different kinds of knowledge stored in the repository. In this case, the knowledge product
KP = T x I
- can define the implementation for a linked list in a brand new language — as long as the template and the specific idioms exist in the repository. It is here that the need for “highly granular” documents becomes apparent. There are dozens of papers and books that tell how you do things in a given language. But until the idioms and simple abstractions contained in those documents can be pinpointed — i.e. individually referenced — there is no hope of automatically generating a knowledge product like the example above.
To return again to the Content-Knowledge-Context (CKC) picture, nuggets in the knowledge layer must (or should) have fine-grained links to items in the content layer.
Related Knowledge is Important
Recall that the template for a linked list is only one nugget of knowledge stored for that concept. Related kinds of information is frequently important to producing an answer.
For example, should I ask “How do I implement a linked list in Java?”, the Linked List topic *header* should link directly to the Java idiom:
new java.util.LinkedList()
In this case, no template-instantiation is needed! For a language like C, on the other hand, in which many implementations exist with different APIs and performance characteristics, multiple responses could be returned, with situational ratings for each. As another example, should you ask “How do I implement a linked list in Cobol?”, a human developer might very well respond with the questions, “Are you sure you want to do that?”, “What is it you’re trying to do?”, “Is Cobol the right language for the job?” “Do you absolutely have to use Cobol and, if so, could you consider using another structure that would be more suitable for that language?”
These are questions that a knowledge repository will probably not be smart enough to ask any time soon. If it *were* able to do that, so much the better! On the other hand, the knowledge repository *should* make it possible to deduce the implementation in Cobol, and put it forward in an online design discussion. Other members of the discussion, should then have access to reference material in the repository to support arguments for why the performance would suck, why another data structure should be used, why a different language should be used, etc. In this sense, the knowledge base is a partner to and support for the discussion, which is carried on the higher level, in the context of the discussion.
Note:
It turns out that Topic Maps do indeed carry the capacity for at least some forms of query classification. Because topics can have multiple names, and because the names can be attached to other topics to define the scope of that name (as well as the scope of the topic), a query system can at least be smart enough to find out what you mean by a given term. So for example if you ask for a “program guide”, the system can ask whether you want information pertaining to software development, a marketing strategies, or the latest theatre release.
Copyright © 2000-2017, TreeLight PenWorks