URFC 002						K NEWS


		  Keyword News System Proposal
			       By
			 Brad Templeton
		 Looking Glass Software Limited
		      (brad@looking.uucp)

		 USENET Request For Comment 002


       NOTE:  This is essentially a document from around  1983
  with	a  few	minor updates.	 Since then I have given up on
  the idea of keywords as the only type	 of  news  classifica-
  tion,	 and  instead  believe in a multi-level hierarchy with
  newsgroups, topics and keywords.  Nonetheless, most of  what
  was said in this document 5 years ago still applies.


       For some time people have  been	using  a  news	system
  based on newsgroups.	This is a short outline of my proposal
  for a news system based on a classification system I	called
  keywords.  The only essential difference between a newsgroup
  and a keyword is that the Keyword news system (or K news) is
  designed  so	that  there  is a very small overhead for each
  keyword.   It	 is  thus  possible  to	 have  thousands   and
  thousands of active keywords with little overhead.

       It is my feeling that  several  problems	 have  emerged
  with	the  old  newsgroup  style  system.   Many of them are
  solved by K news.

  (1)  Due to the limited number of groups, there is  a	 great
       deal  of	 traffic  concerning  what  articles belong in
       which groups  and  whether  certain  groups  should  be
       created	or  destroyed.	Under K news, there is no such
       discussion.  If you want a new keyword, you create  it.
       If you want to use a name that is long and descriptive,
       you can.	 If discussions go under several keywords,  it
       is easy to add them to your list.

  (2)  The limited number of groups also creates  groups  like
       "misc.misc" K news eliminates the need for net.misc and
       allows easy renaming of net.general.

  (3)  Current systems only allow an "or"ing  of  groups  when
       dealing	with multiple groups.  In K news, it is possi-
       ble to request articles that deal with a	 set  of  key-
       words.	ie. one can ask to be shown only articles that
       contain both the	 "science  fiction"  keyword  and  the
       "movie" keyword.


  Brad Templeton					     1


  URFC 002						K NEWS


  (4)  Current systems do not allow grouping all followups  to
       a given article together, or sorting articles according
       to posting date.	 K news provides this because it  uses
       sort(1) on the complete list of articles to be seen.

  (5)  Current news systems are slower	than  they  should  be
       because	they  must  scan  each	newsgroup a users sub-
       scribes to to see if there is  news.   Knews  does  not
       have  this problem.  The K news design can be slower to
       start up, but will be instantaneous in operation.

  (6)  Current systems just don't allow users to be  selective
       enough in filtering news efficiently.  There's just too
       much volume, and secretary programs, kill files and the
       "n"  key	 aren't enough.	 By providing keywords, we get
       an extra level of selectivity in reading news.

  (7)  Current news systems have difficulty  in	 showing  each
       article	only once to a given user, particularly if two
       different news reading sessions are  involved.	The  K
       news implementation scheme I suggest does not encounter
       this problem.

  1.	The Keyword Environment

       K news can solve the B news  problems  by  promoting  a
  different environment with keywords.	First of all, the dis-
  tribution of an article is taken out of  the	keyword	 name.
  This	means  all  keywords are valid over all distributions.
  The fact that there is an "auto" keyword means you can  post
  an  auto article to netwide, statewide or even local distri-
  bution.  This should	cut  down  on  the  number  of	people
  advertising  their  cars to "rec.auto" because the only auto
  group has netwide distribution.

       An article will have several keywords.  The K news sys-
  tem  will  probably  insist  on members from certain sets of
  keywords be there.  For example, there should be a distribu-
  tion	keyword	 with  any  article  that is not local.	 There
  might be a "followup"	 keyword  on  any  followup,  although
  these	 can be detected from their "References" string.  Key-
  words like "spoiler" and "flame" can be put with articles so
  that people can request not to see them.  (Ridiculous groups
  like alt.flame go away.)

       It seems that all articles seem to fall into a  certain
  set of classes.  These classes are "query", "original infor-
  mation", "reprint", "opinion"	 and  "followup".   There  are
  some	sub-classes,  such  as "flame" (a type of opinion) and
  "source code"/"binary" (types of original information).   It
  might	 be a good idea to insist that all posters provide one
  of  these  keywords,	with  the   followup   keyword	 being
  automatic.   Thus  a	reader can shut off all queries or all


  Brad Templeton					     2


  URFC 002						K NEWS


  opinion articles, or both.

       Groups like "misc.misc" will no longer be needed.   Any
  new discussion can easily rate a new keyword, from "big mac"
  to "socks in hyperspace".  The group "net.general" is	 still
  a  bit  of  a problem, but it can now be replaced with some-
  thing like "announcement for all users", and there  will  be
  very	little	implicit cry to put the article in the netwide
  distribution.	 There will still be problems, but  they  will
  be reduced.

       It's also possible that we will still get a lot of  the
  "You	posted	that  to the wrong keyword" type stuff.	 It is
  hoped that since adding and deleting keywords in a subscrip-
  tion	list  will be quite easy, people will not complain too
  much about this.  Even so, it is still  possible  that  some
  utility  to  help  users  select  keywords will be required.
  Each site will keep all known keywords in a DBM  type	 file.
  (This	 will be the total overhead for each keyword.) The DBM
  file entry might contain who first used the keyword,	a  one
  line	entry  describing it, and its newsgroup mapping on a B
  to K interface system.  A simple utility might scan a user's
  article  for	any  of the keywords that occur in the text of
  the article and suggest them as possible entries.  In	 addi-
  tion,	 if  the  user	suggests a new keyword when posting an
  article, a search for keywords that the new one could be  an
  incorrect spelling of would be in order.

       Since the keywords are the  important  thing  that  get
  copied  over in a followup, the subject line will not remain
  the same.  One current problem under B news is that you  get
  discussions  that  wander under the same subject line.  This
  subject soon becomes meaningless.   Any  followup  generated
  with	K  news	 will have an entirely new subject line, since
  both the keywords and the References string will provide  an
  indication of what is a followup to what.

  1.1.  Types of Keywords

       Most keywords  will  be	user  generated.   Stuff  like
  "microcomputer",  "trs-80",  "space  shuttle", "frank zappa"
  and  "homosexuality".	  Others  will	be  system  generated.
  These	 are  keywords	that  apply to distribution, sites and
  such things.	These keywords will all have an	 "="  sign  in
  them	for matching purposes.	"distribution=usa" would match
  articles with usa in the distribution field.	"site=looking"
  would catch articles from Looking Glass Software.

       All keywords when processed by the system will  be  set
  into	lower  case,  and  all sections of white space will be
  mapped to a single space.  An "s" on the end	of  a  keyword
  will not be important in comparison so we don't have worries
  about	 pluralization.	  Keywords   will   be	 sorted	  into


  Brad Templeton					     3


  URFC 002						K NEWS


  alphabetical	order  inside the article so that the same set
  of keywords is always identical when compared.


  2.	The K news implementation

       To develop a keyword based system, we need a  different
  implementation  scheme than the one use for B news.  In par-
  ticular, keywords must have minimal overhead associated with
  them.	   Things like an entire directory and a line in every
  .newsrc file for each keyword can't be used.

       One of the facts that can be used in a new  implementa-
  tion is that the average news reader normally reads only the
  news that has arrived	 since	news  was  last	 read.	 Thus,
  instead  of  scanning	 directories and keeping track of what
  has been read, K news scans a history file and  keeps	 track
  of  what has NOT been read.  In a given session, the history
  file is scanned from the point in time when  news  was  last
  read.	  In  addition,	 a  file of articles not read from the
  previous session is scanned.	The user may  request  to  see
  the  old  articles first, or to have them merged in with the
  newer ones.  Finding out what to read is a simple matter  of
  scanning a few files and should be quite fast.

       I have set out some ideas for implementing the  K  news
  system.   The idea breaks the news software into a series of
  simple,  efficient  modules.	 This  scheme  could  also  be
  applied  to other news systems.  I will present a brief sum-
  mary of the modules with more details further on.

  (1)  The "inews" program  takes  articles,  stores  them  in
       files  and  writes  out history records describing each
       article.	 One history file is kept per  day,  or	 other
       appropriate time interval.

  (2)  The subscription filter program grabs a list  of	 arti-
       cles  not  yet seen from the history files, and matches
       them against the user's subscription file.   It	writes
       out a file containing a list of articles the user wants
       to see according to the subscriptions.  With a  typical
       700 articles per day, this means processing and pattern
       matching a file with 700 long lines.   It  should  nor-
       mally  not  take	 longer	 than  a  grep on such a file.
       (about 2 seconds on a vax-like machine.)

  (3)  Any standard sort  program  sorts  the  output  of  the
       filter  program	to provide a list of articles the user
       wants to see, in the order the user wants to see	 them.
       (Estimated time for Unix sort: 5 seconds.)


  Brad Templeton					     4


  URFC 002						K NEWS


  (4)  A variety of user interface programs read in  the  list
       of  articles  to	 see, and presents these articles in a
       way the user likes.   Most of the work is already done,
       so there can be several of these.

  (5)  Various utilities for use by  user  interface  programs
       will exist, including joke decryption, following up and
       subscription file management.  A special utility	 would
       exist  to take all the articles to be seen and put them
       into a "batch" for  sending  to	other  systems.	  Thus
       other systems become just like users, with subscription
       files.

       Here follows more detail.

  2.1.  Receiving Posted and Transmitted News

       The "inews" equivalent of K news should be quite simple
  to  implement.   When an article comes it, all it need to is
  place the article in a file somewhere (it could even	let  B
  news	do this for it) with possible header processing.  Once
  the article has been placed, a header record must be written
  out  to  the	K  news history file for that date, describing
  various header attributes of the article and	what  file  it
  was  put  in.	 It is not necessary that there be a transmis-
  sion mechanism if batching of news is	 intended.   It	 would
  still be possible to include one, however.

       As noted above, the article can be put in a file	 by  a
  special  program  that  returns  the name of the file.  This
  puts the operating system related things in  a  simple  pro-
  gram, and makes the system more portable.  Whatever the pro-
  gram is that places the article in a file, the  filename  is
  passed to the K news pickup program.	This program will take
  the article, and examine the header.	Important  information
  about the article will be written to a special history file.
  This will include the keywords associated with the  article,
  the "References:" string of the article plus its message-id,
  the date of posting, the pathname of the file containing the
  article  with	 optional seek address and length, and finally
  the subject line.  Note, by the way, that in the case	 of  a
  followup,  any  extra keywords that were not in the original
  article will have to be placed in an extra field so they are
  not  involved in the sorting that groups articles with their
  followups.

       History files will be  maintained  on  a	 one  per  day
  basis,  in a special history directory.  Each history file's
  name will be formed from the date  for  that	history	 file.
  (Perhaps  in	days since the birthday of the net, or perhaps
  in the form yymmdd.)	There may be a new history  file  each
  day,	each week, or even every hour as the site requires.  K
  inews will query the date  and  time	from  the  system  and


  Brad Templeton					     5


  URFC 002						K NEWS


  decide which history file to append to.

  2.2.  News reading stage one - history filter

       The first stage of any  news  reading  is  the  history
  filter  that	is  common  to all news reading and collecting
  programs. This program first notes the last  time  the  user
  read	news  and  finds  the  appropriate spot in the history
  files.  This list of articles in the history	file  is  com-
  bined (if the user has requested it) with a special per-user
  list of articles that have already been processed, but which
  the  user  has decided to read later.	 (As this user file is
  already in the proper order, the merging may	actually  take
  place later to be more efficient.)

       Now the system has a list of possible articles to read.
  It  must  decide  which  ones	 the user wants to see.	 To do
  this, we use a user created "subscription file".  This  file
  contains  a  list  of keyword patterns describing the user's
  taste in articles.  The subscription file  is	 read  in  and
  parsed  into	a tree.	 As will be described the subscription
  file contains keywords and keyword  patterns	that  will  be
  matched  against  articles.	Each  pattern is given a "sort
  value" that indicates how important the associated  keywords
  are.	This sort value may either explicit, or derived impli-
  citly from the order of the  subscription  file.    Articles
  will	be  shown  in  the order dictated by the sort value of
  their keywords, so users can direct  the  order  that	 their
  news will be seen in.

       Lines from the history file are read  in,  and  matched
  against the subscription list.  If they match, the appropri-
  ate line is written out onto a temporary file.  Matching can
  be  done  on	keywords  or  other  information,  such as the
  article-ids in followup  chains,  the	 poster,  the  posting
  site,	 the  distribution  and	 anything  else that is imple-
  mented.  It is important to note that the ability  to	 match
  on  article-ids  allows users to request or shut out discus-
  sion chains based on followups.  Instead of writing out  the
  keywords  to	this  file,  we	 write instead the sort values
  given to each keyword.  These	 sort  values  are  themselves
  sorted  on  the line before being written.  The old keywords
  are also output, but not for the purposes of sorting.

       Once the new  file  is  prepared	 it  is	 sent  off  to
  sort(1),  possibly with the file of previously skipped arti-
  cles appended to it.	The first sort key is the keyword sort
  values.  Since  followups  all  have the same base keywords,
  they will match as equal in the first sort  key.   Since  we
  are sorting by the keyword sort values, the output file will
  have the articles sorted by  keywords	 in  the  presentation
  order	 the user requested.  The next sort key is the "Refer-
  ences" chain, which includes the message-id  of  the	actual


  Brad Templeton					     6


  URFC 002						K NEWS


  article tacked on the end.  Original articles will have just
  their own ID present.	 Thus all followups to a given article
  are sorted in a nice tree.

       Other information output includes the date of  posting.
  While	 we want to sort on this date for articles at the same
  "level" (on a followup basis), it  is	 impossible  for  most
  sort	programs  to  do this.	This sorting must be done in a
  second pass (it's fairly simple) or right  within  the  user
  interface program.

       Any amount of additional information can be  output  to
  this	file.  In theory, most of the header information could
  be written (lines would start getting pretty long)  so  that
  the  user  interface	program need not even open the article
  file for articles a user says "no" to.  This is a  trade-off
  to  be worked out.  One important item that has to be there,
  of course, is the name of the file where the	article	 actu-
  ally resides.

       Once sort is called we will have a file which  has,  in
  addition  to a lot of extra information, a list of pathnames
  for the articles the user wishes to read.  The  keywords  on
  these articles may also be present.  This is passed to phase
  two.

  2.3.  Date & Discussion Sorting

       A special pass may be used to sort by the date within a
  discussion,  since  many  will  want this.  This is a simple
  task that can be left to the user interface  phase,  but  it
  could	 also  be  done	 in general for anybody to use it.  It
  would be slower this way, since a whole extra pass would  be
  required.


  2.4.  User Interface

       User interface programs will vary from  being  dumb  to
  quite fancy.	Since it gets passed a readymade list of arti-
  cles, there is not much work to do.  All a simple  one  need
  do  is  go through the list, and doing what msgs or readnews
  currently does to each file.	 These	programs  will	handle
  replies,  followups etc.  Special utilities will be provided
  for cancelling etc.

       When a user skips an article for later review, the pro-
  gram	can  write  the appropriate line to the unread article
  file noted above.  It is hoped the average user will not let
  this	file  get  too	big.  More sophisticated programs will
  keep track of a list of seek addresses in  the  sort	output
  file	that mark articles that have not been read, and output
  this at the end of a session.	 This allows programs to allow


  Brad Templeton					     7


  URFC 002						K NEWS


  users	 to  skip  back and forth among the articles since the
  information is not written out until the end.	 In  fact,  it
  might	 be  a	useful	utility to provide for writers of user
  interfaces.

       User interfaces can get quite fancy, with  screen  sys-
  tems	like notesfiles and rn.	 It would be nice to provide a
  feature so that unrecognized	commands  are  passed  to  the
  shell	 with a search path list including a special directory
  for news commands.  (Perhaps an environment variable so  the
  user	can  specify.)	 In the news command directory you put
  simple commands like "decrypt" and "undigest" with appropri-
  ate  short  names.   It is expected that several user inter-
  faces will be written, including one just like  RN  and  one
  just	like  notesfiles.   All interfaces to the subscription
  file by the user interface program should  be	 though	 other
  programs that are part of phase one if possible.  This keeps
  things apart.

  2.5.  B and K news Interface and Transition

       In the design of K news, we can plan for three  schemes
  of  usage.  One is to design K news without paying attention
  to any other news systems.  This would require creation of a
  totally  new	net  that  won't talk to newsgroup based nets.
  This would be slow, but has the appeal that it would	create
  a  net  that	wasn't bogged down the way the current one is.
  This "let them stew in their mess" attitude is a bit	snobby
  though, and could create a lot of problems in getting K news
  accepted.  Another thing to consider is that there is a high
  probability  somebody	 will put together some kind of inter-
  face between systems that is jury-rigged and	far  from  the
  best.	  This	happened  with the Notes-B news interface, and
  created a royal mess that was worse than the	problems  that
  would have resulted from working together on things.

       Another scheme is to make a system that	can  interface
  to  B	 news,	but  doesn't plan to do so for long.  The idea
  would be that if K news were good  enough,  everybody	 would
  eventually  switch  and we would have a new pure system.  In
  the meantime they could co-exist.  Aside from the  technical
  problems  involved, there is the question of when the switch
  would occur, and if the idea of newsgroups  would  ever  get
  out of the system.

       The  compromise	solution  is  to  plan	for  permanent
  cooperation by incorporating the newsgroup idea into K news.
  A newsgroup becomes a special, high overhead keyword.	 In  K
  news,	 it  is used as a directory name for storing articles,
  and as the interface to B news.  In this system,  we	demand
  that	K news users provide newsgroups as well as keywords on
  their articles.  Although this has some problems in  educat-
  ing  the  users,  I  think it is no worse than sticking with


  Brad Templeton					     8


  URFC 002						K NEWS


  newsgroups.

       If newsgroups exist, and B news sites exist, a  mechan-
  ism  is  required  that  maps newsgroups to appropriate key-
  words.   One	simple	mechanism  is  just   to   include   a
  Newsgroup=xxx	 keyword for each newsgroup an article belongs
  in.  K news users can select that keyword in their subscrip-
  tion file.  Slightly more sophisticated would be to create a
  mapping table at B to K interface sites so that articles  in
  a  group  like  "net.columbia"  get  keywords	 of  the  form
  "Newsgroup=net.columbia" and "space shuttle".

  2.6.  Shipping to other sites

       With  new  modifications	 to  uux   possible,   It   is
  envisioned  that each site receiving news from a K news site
  would essentially have a .newsrc like file on the forwarding
  site.	  This	is  to say that each site would be in the same
  position as a user, with a keyword subscription list	and  a
  list of unread articles.  Forwarding could either be done by
  using the same process a user	 does  to  read	 news  when  a
  transfer  is made, or by having the K inews check each arti-
  cle in the subscription files for known  sites.   The	 first
  way,	of course, is much more efficient.  With batching, the
  first stage readnews process could be	 run  to  collect  the
  chosen files in a batch.

  2.6.1.	Distribution

       In order to keep a  site's  subscription	 file  simple,
  distribution	keywords  (required  on	 all articles) will be
  matched by  "distribution=xxx",  where  xxx  is  stuff  like
  "local", "canada", "usa", and the dreaded "worldwide" (equal
  to "net").  The default  distribution	 for  posted  articles
  will	be  set	 locally, but it should be encouraged to be as
  small as reasonable, such as the local state or province.

       One problem with this sort of distribution scheme  (and
  the  current	B system) is that sometimes a user really does
  want an article distributed netwide in the "auto"  newsgroup
  but  only locally in the "general" newsgroup.	 Consideration
  must thus be given to explicit distribution bindings on key-
  words.  My suggestion is to have the "distribution" keywords
  (as we think of them now)  apply  to	all  keywords,	except
  those with an explicit distribution.	Thus a file with:

      Subject: Toronto Space museum opens
      Distrubtion: local
      Keywords: events, space/north america

  Such an article would go to "events"	readers	 locally,  and
  "space" readers both locally and all over the continent.


  Brad Templeton					     9


  URFC 002						K NEWS


  3.	Subscription List

       One of the most important facets of the K  news	imple-
  mentation  I propose is the use of a sophisticated subscrip-
  tion list.  This list would be used by both users and	 sites
  to  decide  what  articles  are to be seen during a session.
  Fundamental to this scheme is the ability to define  keyword
  patterns,  so that selections can be done on not just single
  keywords (as B news works) but on arbitrary combinations.

       The first reading program will maintain two files.  The
  first	 of  these is the subscription list.  This tells which
  keywords and discussions the user is	interested  in.	  This
  will be a list of keywords subscribed to and boolean expres-
  sions built from them.  Keywords are actually text  strings,
  but  they  may not contain a special set of characters which
  are used to delimit them.  These  characters	are  "="  ":",
  ",",	"!",  "[",  "]",  "&",	"|", "*", "/", "(", and ")" to
  start with.  Some, like "=", are used	 within	 meta-keywords
  to  match  special  conditions  known	 to  the software like
  sites, article-ids and the  like.   No  doubt	 more  special
  characters  should  be  reserved  for future use, while some
  should be allowed within keywords.  Each line in a subscrip-
  tion	file  consists	of  a  keyword pattern to describe the
  user's interests.  In addition, some special	lines  in  the
  subscription	file  will  tell what the user wants done with
  articles from the previous  session,	and  possibly  special
  options.

       A typical subscription line lists  a  keyword  pattern.
  For example, the line:

      science fiction

  Asks for all articles with the  keyword  "science  fiction".
  Quotes  may be required, but this is a matter to be decided.
  It also makes sense that any blank fields in	a  keyword  be
  compressed to one space so that typos do not cause problems.
  The line "!star wars defence" would  ask  that  no  articles
  with	the keyword "star wars defence" be shown.  We can also
  ask for "Ronald Reagan & taxation" to ask for	 all  articles
  with both of the keywords show.   Similarly "Ronald Reagan &
  !taxation" shows us all articles about old Ron that  do  not
  contain the taxation keyword.	 Or we could go for

      Ronald Reagan & !( taxation | star wars defence )

  Which shows us articles about Ron that have  nothing	to  do
  with taxation or the star wars defence scheme.

       The order in the file is	 important.   When  phase  one
  tries	 to  figure  out if a user wants to see an article, it
  scans through the information in the subscription  list,  in


  Brad Templeton					    10


  URFC 002						K NEWS


  order.   It  stops as soon as it finds some form of definite
  information.	This  means  either  positive  information  or
  negative  information.   If the first line in your subscrip-
  tion file is "Ronald Reagan", you will see  all  such	 arti-
  cles,	 even  if  they	 contain other keywords that you hate.
  Likewise, if the first line in the file is "!Ronald Reagan",
  you will never see an article about him, even if it contains
  a keyword you subscribe to later on.	(There is an alternate
  system described below to change this.)

       The character "*" will match any keyword.  It would  be
  placed  on  the last line of a subscription file to indicate
  that any keyword not marked with an "!"  is  subscribed  to.
  It  is  doubtful  anybody would use this after the number of
  keywords grows.

       Keywords may have "sort attributes" on them to indicate
  which	 keywords  you	would  like to see first in a session.
  These are essentially ascii strings which will be passed  to
  sort(1).   If	 you  want to see articles about "system shut-
  down" first, you give it a low value like "A".   If you want
  to  see articles about "big mac" last you give a priority of
  the form "zzzzzzzz".	The nice thing about this is that when
  you  have  a	new keyword, you can easily give it a priority
  between any two that exist, unless you have given  something
  a  priority  like  "^@", in which case it would be first for
  all time.  We now see lines like:

      system shutdown [AAA]
      space [bb] & challenger [cc]


  3.1.  Sample file

       Here are some sample subscription lines that you	 might
  have.	  The  comments	 actually  would  not  be in the file,
  although that could be a possible feature.

      OPTIONS: +newkeywords +oldnews  ; show me new keywords that have come in,
				      ; and mix in my old news from before
      !flame			      ; show me no flame articles
      !query			      ; show me no "does anybody have" articles
      system news
      microcomputer & !trs-80	      ;anything on micros that isn't on trs-80s
      unix & !(4bsd | version 7)
      sex & drugs		      ; anything about both
      rock & roll		      ; 8-)
      site=looking & poster=brad      ; anything from me - the default ;-)
      movies & distribution=ontario   ; movie articles from my own province only
      distribution=local	      ; anything posted on my own machine
      art=123@looking		      ; that article and any followups
      !art=124@looking		      ; none of that article or any followups
      !(!source code & #size>7K)      ; a possible feature, no file bigger than 7k


  Brad Templeton					    11


  URFC 002						K NEWS


				      ; that isn't a source file


  4.	Typical Session

       The typical user interface program will first check  to
  see  what  new keywords have come in since the last session.
  These will be recorded in a separate history file  in	 which
  the last position read must be recorded.  The user, if it is
  requested by appropriate options, will then be given a  list
  of  new keywords that have appeared since the last time news
  was read.  Some systems will query the user and allow him or
  her to place these new keywords in the subscription files.

       The user interface must now call the phase one program.
  with	appropriate  options, and the name of a temporary file
  to put the sort output in.  It may  also  request  the  sort
  output  on  a	 pipe if that is all it needs.	(Most programs
  will want to be able to seek back in the output file.) Arti-
  cles	will  then  be	shown  in the order requested, grouped
  perhaps according to followup discussions or major keywords.
  At  the  end, a list of unread articles will be written out.
  Articles will probably be grouped by discussions and	higher
  priority  keywords.	Followups  will	 insist on a change of
  subject and allow an addition of keywords and	 a  change  of
  the distribution.

  5.	Alternate Subscription Idea

       It is possible users will require more control on which
  subscription	lines get priority than the order in the file.
  Thus it is proposed that keywords get points	based  on  how
  much	a  user	 wants to see a keyword.  Keywords you want to
  see would get positive points and keywords you don't want to
  see  would get negative points.  For example: "Ronald Reagan
  : 5" would assign 5 points to any  article  containing  that
  keyword.   On	 the  other  hand "star wars defence : -4" and
  "taxation : -6" would assign negative points to  those  key-
  words.  In this case, you would see articles with Reagan and
  star wars defence, but would not see	articles  with	Reagan
  and taxation.	 Scores would apply to whole lines.  For exam-
  ple:

      (Ronald Reagan [abc] & taxation [cde]) : 20

  Would give 20 points to any article with both keywords.

       In this system, any article must scan the  whole	 list.
  For  every match we get, we add the points assigned for that
  match to our sum.  If, at the end, the  sum  is  >=  0,  the
  users	 sees  the  article.  If negative, it is not seen.  It
  should also be possible to assign scores of "oo"  and	 "-oo"
  which	 would	represent  infinite  scores  and stop the scan


  Brad Templeton					    12


  URFC 002						K NEWS


  right away.

       In any system, by the way, the whole subscription  file
  must be read into RAM.  Since the phase one program has lit-
  tle to do but read this file, however,  the  K  news	system
  should  be  able  to handle large subscription files.	 Since
  followup message-ids will also be placed  in	this  file,  a
  utility that deletes very old ones would be a good idea.

  5.1.  More Random Ideas

       We can add subscription features as we like.   It  will
  have	to  be worked out what users want.  Some ideas include
  the scheme above, plus:

  (1)  The ability to match a keyword only if it is  alone  on
       the  line.  For example, you might want to see articles
       about "microcomputers" but not if they  are  associated
       with  other  topics.  Same with "abortion".  This would
       be done with a numeric "keyword count" variable, so you
       might say "abortion & #keycount == 1".

  (2)  Real pattern matching on keywords,  regular  expression
       style.	This might be too slow, for if you don't allow
       it, it lets the keyword programs map the keywords  seen
       to  integers  for easy matching.	 But it might be worth
       it.

  (3)  Pattern matching on the	subject.   This	 is  something
       various	news  secretaries  do.	In theory, this should
       not be necessary as any important word you might search
       for would probably be a keyword.

  (4)  Pattern matching on the body.  This could  be  done  by
       means of those special hash formulae (such as csh uses)
       that tell if a given string is NOT within  an  article,
       with  some  reliability.	  Body	pattern matching would
       only be applied on articles that	 need  it,  or	things
       would get too slow.

  (5)  Timestamps on patterns added by programs	 to  the  sub-
       scription files.	 When you decide to shut off a discus-
       sion, the software will add a  "!123@looking"  to  your
       file.  You don't want these to build up, so it might be
       good to have timestamps on them so  that	 they  can  be
       removed later on once a discussion is dead.

  (6)  Piles more in  the  way	of  special  keywords  in  the
       required	 group,	 so people can be more specific.  Dif-
       ferent types of classified ads.

  (7)  Facilities for moderators.  Ability to pattern match on
       the moderator of choice.


  Brad Templeton					    13


  URFC 002						K NEWS


  6.	Criticism and Answers

       Of course no system is perfect and  some	 have  pointed
  out  a few problems that may arise with K news.  For most of
  these, I feel that the problem  is  even  worse  with	 news-
  groups, or at least little better.

       The main point is that some people feel that there  are
  too many newsgroups now as it is.  This is to say that there
  are too many to remember them all.  Some feel that with  the
  proliferation	 of  keywords, users will be less certain what
  keyword to use, and post to the wrong	 keyword  more	often.
  Thus	some  important	 information  that you might have seen
  could be lost.

       It's my opinion that far more important information  is
  lost today because of the noise that results from newsgroups
  being to general in scope.  I, and many others, have	unsub-
  scribed to groups we are interested in because we can't han-
  dle all the garbage in the group to sort out	the  gems.   I
  also use the "n" key a great deal - on over 70% of the arti-
  cles in groups I do read.  If the subject is too  short,  or
  "Orphaned  response"	or that sort of think, I say "n" right
  away.

       To keep this down, the answer is more software.	As the
  need arises, we might see fancy programs to help people find
  the right keywords.  Whenever somebody creates a keyword, it
  will	be  their  duty	 to  make  a  short description of it,
  including  related  words.   Thus  the  creator  of  "ronald
  reagan" would add a line saying:

      president, arms race, abortion, economy, usa, government, politics

  and an appropriate utility could take words  from  the  user
  (perhaps  even  text	of an article) and "grep" for words in
  the keyword list.  This would be an special  utility	called
  by  the  news	 posting  utility,  so it could be written and
  maintained at yet another location.	This tool  could  also
  use  standard spelling correction algorithms to suggest key-
  words.  Naturally, news administrators  could	 update	 these
  keyword  descriptions	 if  the creator of the keyword didn't
  come up with a good one.  A control message could even  keep
  the file up to date.

       Of course, keywords can be organized in hierarchies  to
  make them easier to find.

  7.	Comments

       This is just a  draft  proposal,	 and  lots  of	little
  details are missing.	comments are welcome.  Also welcome is
  somebody to implement the thing since many  people  are  too


  Brad Templeton					    14


  URFC 002						K NEWS


  busy	to  do so.  The implementation could be done in spots,
  and much of the code can be taken from the existing  B  news
  since the same header formats etc.  would be used.  I can be
  reached at watmath!looking!brad Watmath is called by	ihnp4,
  decvax, utzoo, uunet and many others.


  Brad Templeton					    15