ClariNet ClariNews Article Format

ClariNet's ClariNews (formatted e.News) articles have our full USENET headers and a body of HTML. One extra feature in the headers is the addition of an HTML comment surrounding most of the header data. These headers look like this: X-HTML: <HTML><!-- [...normal headers...] X-HTML2: --> This causes most of the header data to be ignored by programs that only understand HTML data and not USENET articles. The processing of the headers by various news systems may move one or more headers outside these comment lines, however this is typically one line out of a couple dozen.

The body of each article is HTML. It contains an <HTML> tag, a <HEAD> section (including a <TITLE>) and a <BODY> section. In the BODY are embedded tags to allow various bits of data to be parsed and/or modified. These are useful to have in addition to the article headers because the headers can be stripped off to turn the articles into pure HTML. These extra tags are simply ignored by a web browser without any ill effects.

All the ClariNet data tags have the form <CLARI-ITEM TYPE> along with the corresponding closing tag of </CLARI-ITEM>. The TYPE varies depending on what we're marking (see below for a full list).

This tag is used in two different ways: a single-line form and a multi-line form. The single-line form looks like this:

<H1><CLARI-ITEM HEADLINE>Mars Teeming With Life</CLARI-ITEM></H1> There is never more than one CLARI-ITEM per line, but other HTML tags and non-tagged data may be on the same line. The single-line form never contains other CLARI-ITEMs and never includes the TYPE in the closing tag.

The multi-line form looks like this:

<I><CLARI-ITEM CAPTION> Little Blue Men arrive from Mars and demand that we give them females to revive their dying race. </CLARI-ITEM CAPTION></I> In the multi-line form the CLARI-ITEM tag is the last thing on the line prior to the data, though not necessarily the only thing. The closing tag is always the first thing on the line following the data, and is always marked with its TYPE. Other CLARI-ITEMs may be contained within a multi-line item (such as a CAPTION being contained by a PHOTO-TABLE).

CLARI-ITEM tags respect the nested structure of HTML -- you'll never see a block element that is opened in a tag but not closed or closed that wasn't opened.

The following CLARI-ITEM types are currently defined:

ANCHOR
Used in some specialty stories to mark anchor items of significance.
BACK-LINK
(NewsDrill only) The HTML for the back reference inside a LINKS-INDIRECT block.
BOUNDLINKS
(NewsDrill only) Links that are bound to specific words within the story. (Multi-line)
CAPTION
The photo's caption text (part of the PHOTO-TABLE). (Multi-line)
COPYRIGHT
The story's Copyright.
DATE
Taken directly from the Date header.
FOOTNOTE
Foot-note text. (Multi-line)
FROM
Just the comment portion of the From header.
HEADER
A block of header information for the story. (Multi-line)
HEADLINE
Typically the same as the <TITLE>, but may have some data stripped (such as a trailing date).
IMAGESTORY
We now pack our comics and cartoons in this tag (these stories don't have a STORY tag). (Multi-line)
LINK
(NewsDrill only) Each link of a link/description pair will soon be marked with this tag. (Multi-line)
LINK-DESC
(NewsDrill only) Each description of a link/description pair will soon be marked with this tag. (Multi-line)
LINK-STATUS
(NewsDrill only) If a particular link has special status (such as "[Reviewed]"), it will be noted with this tagged item.
LINKS-DIVIDER
(NewsDrill only) Bits of HTML that separate links or trail the last link in the BOUNDLINKS and STORYLINKS sections. (Multi-line)
LINKS-INDIRECT
(NewsDrill only) A tagged block of link information that one of the links in the article references. (Multi-line)
NEWSGROUPS
Newsgroup information for this story in tabular form. (Multi-line)
KEYWORDS
A list of keywords for this story separated by commas.
PHOTO-TABLE
A table used to display an in-line photo. (Multi-line)
SLUGWORD
The story's slug word.
STORY
The text of the story. (Multi-line)
STORYLINKS
(NewsDrill only) Links that releate to the story, but are not bound to specific words within it. (Multi-line)
TAGLINE
(Coming soon) The bottom-of-the-story links.
TRAILER
The bottom-most HTML, usually used to display some handy links. (Multi-line)
TYPE
The story's type (currently "story" or "feature").
XINFO
A block of "extra info" about the article, such as the NEWSGROUPS clari-info. Since this info is typically shown by a newsreader, this data is commented out of the news articles, but might get uncommented by the web extraction software. (Multi-line)

Not all of these items are present in every story. Just ignore any unknown tags (the normal thing to do when parsing HTML). Also remember that it is easy to tell programatically if you're dealing with a multi-line tag because the opening tag will be the last item prior to a newline.

There are also some special sub-tags for certain tags. SomeIMG tags have CLARI-XFN after the "SRC", some Anchor tags have CLARI-XGN after the "HREF", and various tags (such as HTML and BODY) can have a CLARI-STYLE tag. The purpose of these tags is to allow the news-hosted version to be easily converted into a web-hosted version. For instance, all external references to newsgroups and other articles (including images) are in news-relative links. If the articles are translated into a web-based format using our extraction technique, the conversion is done using these tags.

Note that if you are parsing extracted files, these tags will have already been processed, and in the case of the CLARI-XFN and CLARI-XGN tags, these values will be swapped with the value of the preceding SRC or HREF sub-tag. When this happens the previous news reference is placed in the sub-tag CLARI-MID (for message IDs) and CLARI-GID (for group references).

Here's an example article. It contains one in-line photo (there can be 0, 1 at the top, or 1 at the top and 1 at the bottom) and 3 icon images.

<HTML><HEAD> <TITLE>UPS, Teamsters reach tentative settlement described as "historic"</TITLE> <META HTTP-EQUIV=Refresh CONTENT=3600> </HEAD> <BODY> <CLARI-ITEM HEADER> <!-- <CLARI-ITEM XINFO> <TABLE CELLSPACING=0 CELLPADDING=0> <TR><TD VALIGN=top ALIGN=right><B>Newsgroups:</B> <TD><CLARI-ITEM NEWSGROUPS> <A HREF='clari.web.news.labor.strike' CLARI-XGN='clari.web.news.labor.strike'>clari.web.news.labor.strike</A>, <A HREF='clari.web.biz.industry.transportation' CLARI-XGN='clari.web.biz.industry.transportation'>clari.web.biz.industry.transportation</A>, <A HREF='clari.web.biz.top' CLARI-XGN='clari.web.biz.top'>clari.web.biz.top</A>, <A HREF='clari.web.usa.top' CLARI-XGN='clari.web.usa.top'>clari.web.usa.top</A>, <A HREF='clari.web.news.photos' CLARI-XGN='clari.web.news.photos'>clari.web.news.photos</A> </CLARI-ITEM NEWSGROUPS> </TABLE><P> </CLARI-ITEM XINFO> --> <IMG SRC='0819top.gifI01@web.clari.net?part=2' CLARI-XFN='xws/ah/0819top.gif' ALIGN=right WIDTH=35 HEIGHT=29 ALT="Logo [Aug 19]"> <I><A HREF='http://www.clari.net/'>ClariNet</A> <CLARI-ITEM TYPE>story</CLARI-ITEM> <B><CLARI-ITEM SLUGWORD>US-UPS-SETTLE</CLARI-ITEM></B> from <CLARI-ITEM FROM>AFP / Nathaniel Harrison</CLARI-ITEM></I><BR> <H1><CLARI-ITEM HEADLINE>UPS, Teamsters reach tentative settlement described as "historic"</CLARI-ITEM></H1> <I><B><CLARI-ITEM COPYRIGHT>Copyright 1997 by Agence France-Presse</CLARI-ITEM></B></I> / <I><CLARI-ITEM DATE>Tue, 19 Aug 1997 1:42:06 PDT</CLARI-ITEM></I><P> </CLARI-ITEM HEADER> <CLARI-ITEM PHOTO-TABLE> <TABLE BORDER=1 HSPACE=7 VSPACE=3 ALIGN=right CELLSPACING=0 CELLPADDING=10 BGCOLOR='teal' WIDTH=240> <TR><TD ALIGN=left> <A HREF='cn19657.htmlH01@web.clari.net' CLARI-XFN='xws/am/cn19657.html'> <IMG SRC='photo-cnt19657.jpgI01@web.clari.net?part=2' CLARI-XFN='xws/am/photo-cnt19657.jpg' width=227 height=180 ALT="Photo [Tue, Aug 19]"></A> <BR><P><FONT COLOR=maroon><CLARI-ITEM CAPTION> LOS ANGELES, CALIFORNIA, 18-AUG-97: Striking United Parcel Service workers hold their picket signs at a UPS facility in downtown Los Angeles August 18 as they celebrate news of a tentative agreement between the Teamsters' union and UPS in the late evening hours. The strike has gone on for fifteen days. [Photo by Fred Prouser, Reuters] </CLARI-ITEM CAPTION></FONT></TABLE> </CLARI-ITEM PHOTO-TABLE> <CLARI-ITEM STORY> <P> WASHINGTON, Aug 19 (AFP) - The United Parcel Service and the Teamsters union have struck a tentative deal to end a two-week-old strike against the package shipper that had snarled commerce nationwide, the parties announced early Tuesday.</P> <P> The agreement could mean a resumption in UPS deliveries as early as Wednesday, said Teamster president Ron Carey, who appeared at a news conference here with US Labor Secretary Alexis Herman.</P> <P> "I congratulate both sides for their victory," Herman told reporters at a downtown hotel, where officials from both sides -- along with a federal mediator -- had been locked in near-nonstop talks since early Thursday.</P> <P> Herman hailed both parties, saying the company and the union held "shared values" for their workers.</P> <P> "They also have a real commitment (to the workers) that I believe is demonstrated in the historic settlement they have reached," she said.</P> <P> An estimated 185,000 Teamster drivers and handlers walked off the job August 4 in a dispute over union demands for more full-time jobs and rejection of a company plan to replace the Teamster pension scheme with one of its own.</P> <P> It was Herman who had coaxed the two sides back to the bargaining table last week and who had spent the last several days exhorting them to narrow their differences.</P> <P> The accord could also be viewed as a victory for President Bill Clinton, who held firm against anxious appeals from the company and retailers for presidential intervention to end the walkout.</P> <P> Clinton had all along insisted a settlement had to come from the parties themselves rather than the government.</P> <P> "Today's agreement represents their hard work and determination to reconcile their differences for the good of the company, its employees and the customers they serve," the president said in a statement issued from the Massachusetts island of Martha's Vineyard, where he is vacationing and -- on Tuesday -- celebrating his 51st birthday.</P> <P> UPS chief negotiator David Murray, who also appeared with Herman, declined to comment on details of the accord until it had been formally presented to the workers.</P> <P> But he said the company would resume its services "very soon."</P> <P> "We believe it is an agreement that we'll be able to remain competitive with," he added, while acknowledging that "no one comes away from the bargaining table with everything he wanted."</P> <P> Carey by contrast was willing and eager to talk about the accord and called his own news conference at Teamster headquarters here.</P> <P> He said the agreement would meet the union demand for 10,000 new full-time jobs over the life of a five-year contract, created largely by combining part-time positions. The company had offered to create 1,000 new full-time slots.</P> <P> The deal also secured "very large pension increases," as much as 50 percent in some cases, Carey said.</P> <P> "These plans are now under the Teamster pension plan just as they've always been -- not a company controlled pension plan."</P> <P> In addition, the accord would boost salaries, with the earnings of a part-time worker increasing from 11 to 15 dollars an hour by the end of the contract, according to the Teamster boss.</P> <P> The package in additon replaces subcontractors with UPS workers and establishes new safety guarantees for workers handling heavy packages, Carey said.</P> <P> With approval from several union committees, workers could soon begin returning to their jobs, he added.</P> <P> The rank and file would presumably then have to vote on the package, probably by mail.</P> <P> Earlier, Teamster officials had taken heart from opinion polls showing public sympathy for the strikers.</P> <P> UPS had warned that bowing to union demand would cripple the company, noting that the strike had cost it between 200 and 300 million dollars a week as well as valuable regular customers who were switching their business to competitors.</P> <P> While no dollar value was ever put forward as to the effect of the stoppage on the US economy, retailers and small businesses -- unable to afford prices charged by UPS competitors -- were particularly hard hit.</P> </CLARI-ITEM STORY> <BR CLEAR=all> <CLARI-ITEM FOOTNOTE> <HR ALT='-=-=-'> Want to tell us what you think about the ClariNews? Please feel free to <A HREF="mailto:comments@clari.net">email us your comments</A>. </CLARI-ITEM FOOTNOTE> <CLARI-ITEM TRAILER> <HR> <A HREF='0819bottom.htmlH01@web.clari.net' CLARI-XFN='xws/aj/0819bottom.html'> <IMG SRC='0819bottom.gifI01@web.clari.net?part=2' CLARI-XFN='xws/aj/0819bottom.gif' WIDTH=137 HEIGHT=47 BORDER=0 ALT="bottom [Aug 19]"></A> <A HREF='0819extra3.htmlH01@web.clari.net' CLARI-XFN='xws/au/0819extra3.html'> <IMG SRC='0819extra3.gifI01@web.clari.net?part=2' CLARI-XFN='xws/au/0819extra3.gif' BORDER=0 ALT="Icon [Aug 19]"></A> </CLARI-ITEM TRAILER>

You can view how it looks when it is formatted, if you like.