![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
ClariNet stories and Slugwords
ClariNet wireservice news stories almost always come with a special keyword, known in the trade as a "slugword." The slugword quickly identifies the story and usually contains one or two words that describe the topic of the story. Slugwords have a number of purposes, and you can make ClariCGI queries to match on slugwords in a number of useful ways. Generally you do this with a query of the form g/pattern to have a regular expression match a slug. You can also use g=pattern to match a slugword exactly. Note: At this time, ClariCGI extracts the slug from the message-id, which is where ClariNet puts it. The Slugword header is really the better place to extract the slug, but most systems are not configured to include this non-standard header in the News overview database, and ClariCGI only matches things found in the overview database. At a later time, ClariCGI may be modified to use the real header. Slugwords found in message-ids consist of one or more upper case characters to indicate the wire source of the story, and then the wire's own slugword. Thus Uclinton (UPI Clinton story) is distinct from Rclinton (Reuters Clinton story) and even Oclinton (Reuters Online Clinton story.) No matter how the wire provides it, we put the real slugword in lower case.
Story UpdatesOne very common use of slugwords is to send several versions of the same story as updates during the day. Each version has the same slugword, so our software knows to replace earlier versions with the newer version. The slugword lets you track the latest version of a given story. In fact, newcgi uses this so that if a story is replaced between the time a directory of stories in a newsgroup is produced and the story is actually read by being clicked on, it can find the new version with the new slug. (Another way to do this would be to track Supersedes chains, but this doesn't work because Supersedes only allows one message-id and it is not usually found in the overview database.)
Standing StoriesFor the ordinary adminstrator or user wishing to refer to stories, slugwords identify regular or "standing" stories. These are stories that come every day or several times a day. They are sometimes updates but sometimes not. For example a daily column might arrive every day with the same slug. The slug lets you know it is the column or feature in question. You can find a story if you know in advance what the slug will be, with a g/pattern query as described above. ClariNet's web pages attempt to document some of the more important standing stories we publish. But the list changes frequently as we add sources and as sources themselves change, so the best way to find useful standing stories is to simply read the newsgroups and not regular features you find useful. Look at their Slugword: header field.
ClassificationWith some of ClariNet's sources, you can actually figure out a bit about story topic from the slugword. If this is true, ClariNet's software will usually have already done this for you and put the story into the right newsgroup for its topics. But in some areas, such as sports, the tuning is very fine, and so several subtopics (such as a particular team) appear on one general newsgroup. This fine grain classification is too detailed for us to document or support. There are many thousands of possible patterns and classifications, particularly in the sports area. However our sales staff will give assistance to site administrators attempting to work out specific slug patterns for their particular interests or geographic needs.
No Slug!A very few sources, such as Newsbytes and some syndicated features, do not come with slugwords. In such cases, you must rely on patterns that match the From: line, which at lest always identifies the source, or the subject itself. (For example, you can spot something like Newsbytes' daily internet update from the subject line.)
|
||
|