ClariCGI Configuration & Compile

Use this image map or the text links at the bottom of this page for navigation.

The ClariCGI program is a CGI which should be installed in a suitable directory in your web-server/news-server. Often it is installed in the file cgi-bin/claricgi in your web server's master directory.

To get the source code, you can FTP it from ftp://ftp.clari.net/pub/sources/claricgi.tar.Z

(Note that the system must have both a web server and news server. If you don't have that, here are some tips.

However, it also supports (if compiled for it) the Open Market FastCGI (FCGI) operation mode, which makes it run as a deamon, handling requests as they come but not starting and stopping for each request. This is very much the recommended mode if you have a server like Apache 1.1 or thers that support FastCGI.

The ClariCGI has to run on a system that has both a web server and a live USENET news spool. It accesses news files directly. For it to use NNTP, opening an NNTP server connection and talking to it for each web hit, would be far too inefficient.

If your news server does not have a web server, you could either install a simple one just for this CGI, or perhaps consider NFS mounting your news spool and news library directories onto a system that does have a web server and can call this CGI program. We have more info on this available.

The program needs a configuration file, normally kept in /usr/local/etc/httpd/conf/claricgi.conf. If you don't want to keep it there, you can change the define in claricgi.h before compiling. If you have the binary, and are using FCGI, you can define the environment variable CLARICGI_CONF with the location of the config file.

While you can change the other definitions in claricgi.h, you can usually make all other changes in the config file. In the recommended FCGI mode, configuration file load time is not a concern.

Compiling

The program is meant to be highly portable, as it performs only basic Unix operations. Unpack the distribution and consider the following small number of configuration choices. You'll want to copy the Makefile.dist to Makefile and edit the Makefile to match your configuration.

FCGI

If you are compiling for FCGI, you must fetch the FCGI toolkit and build it. Then edit the Makefile to provide the locations of the FCGI library and include directory. Uncomment or remove the blank definitions.

Note that a program compiled for FCGI will work as an ordinary CGI, so the same binary does both. So even if you plan to test as a CGI and then install as FCGI, you should build as an FCGI.

In fact, we recommend you make a link the the program in your cgi-bin directory so you can run it as a CGI. Since you need to kill and restart the FCGI in order to get it to reread the configuration file and re-open the DBM database, testing and prototyping of new configurations and templates can be hard with it. For that, use it in CGI mode.

Ordinary CGI

If you don't have the FCGI developer's kit or don't plan to run a web server that supports this protocol, you can build the program as a plain CGI. The Makefile.dist comes configured this way.

If you don't have a server that supports FCGI, you can also get their special cgi-fcgi program. This program is a very tiny CGI program that will connect to ClariCGI in FCGI mode and exchange data with it. This way ClariCGI runs as a daemon, and the system only starts and stops the tiny cgi-fcgi program. However, as all ClariCGI does to start up is open the config file and optionally the DBM database, this may not be worth it. ClariCGI is only about 110K of code without FCGI, though it does use more RAM.

DBM and group descriptions

The program includes support for NDBM databases. The configuration file can be read from a DBM database, but generally you will only do that if you are not running FCGI and want to really minimize your file opens.

The DBM database is used to provide efficient access to newsgroup descriptions for menus of groups, and to provide efficient access to template programs. You don't strictly need it for pure USENET operation or if you define no templates and use none of ours. We have some perl scripts that will turn your newsgroups file into a DBM database, and we also have scripts that will turn ClariNet description information, available from our site, into the DBM database. This is a great way to easily provide a custom look for each newsgroup. These programs are in the claridb subdirectory of your ClariCGI distribution.

(Note that the DBM data for newsgroup descriptions will not be used unless you define a Group_Listing template that uses DBM.)

However, if you don't want to use DBM, you can define the symbol NO_DBM in the Makefile or in claricgi.h.

You can also use GDBM with its ndbm interface. In fact, since NDBM is limited to 1KB records, if you want to store template files in the DBM database, GDBM is recommended because it does not have the 1KB limit. You will need to modify the Makefile to use gdbm, and the build.perl script in claridb takes a "+g" option to build with GDBM.

Destination

You can specify where you want to install ClariCGI in the Makefile with the DEST or FCDEST variables, or just with the WEBSERV variable in most cases. As noted, you can compile the binary for FCGI and it will work as either an FCGI or CGI.

Compile the program by typing "make". After -- or perhaps even before -- you set up the config file, you can test it with some command line options, such as "q=query" to simulate a query. In this mode it does not output a Content-Type header but it does what it would as a CGI.

After making, you can install in the appropriate binary directory for CGI programs. This is usually /usr/local/etc/httpd/cgi-bin but you can put it anywhere that a CGI program can be run from. Some web servers are configured to allow any file in the web tree that ends in .cgi to be used as a CGI program. A "make cgi_install" will do this for you.

If you are installing as FCGI (recommended) the configuration is alas somewhat more complex. See the documentation for your web server on how to configure an FCGI. See instructions for Apache 1.1 if that is your web server. We recommend that you simply put ClariCGI in the root of your web server as clari.fcgi or clari.fcg. A "make fcgi_install" does this. In your web server's config, you should also include a line to define the environment variable CLARI_FCGI so that the program knows it is operating in FCGI mode. It will also do this if it is called with a command name with "fcg" in it.

Other-User installation

CGI programs tend to run as a special dummy user, often the same user the web server runs as, called "nobody." You may want to create a special user account for ClariCGI for security reasons, and make the cgi program setuid to that account. In that case, any directory ClariCGI writes into, namely the image cache directory, needs to be writable by only that user and readable by the web server.

Do you allow general users on your system to run CGI programs as the "nobody" user? If so, claricgi will have access to files writable by that user and ordinary users will have access to the claricgi spool directory. This is not particularly dangerous, but it is a mild hole you can fix by running the program setuid as another user.

In addition if you want ClariCGI to be installed or maintained as some user other than the webmaster, you should compile it to use a different configuration file other than the default (which is usually a directory only the webmaster has access to) and all the webmaster has to do is create a file tree controlled by that user for the image cache and a way to execute ClariCGI as a setuid CGI.

Do not have ClariCGI run setuid to a system user like news or root. There is no need, and while we believe the program to be reasonably secure, no program is perfect. An ordinary user with no special permissions is fine. NOTE: Ordinary users will be able to execute the binary of ClariCGI on their own input files, but as the owning user. The only files opened for write by claricgi are image cache files, and their pathnames are strongly checked. The config file should make all read access secure.

Building the claricgi.conf file

Auto-Configuation

For most sites, our auto-configuration web page for claricgi.conf will handle all the configuration needs you have. Just fill in the form on the ClariCGI Auto-Config Web Page and save out the config file it generates.

Alternately you can take the sample configuration file provided in claricgi.conf.dist and copy it to claricgi.conf and edit it to your needs. If you plan to do complex configuration and write templates, you will want to use this method. You will also want to look in the samples directory for examples of templates and definitions to put in your configuration file.

Edit the file and copy it to the location you have set for the configuration file.

The syntax of the configuration file is:

Parameter Name:[WHITESPACE]Value

There are comments in the default configuration file. They should explain most parameters you may wish to tune. Here are some notes.

Legal Newsgroups

Define "Newsgroup" lines to list the newsgroups the ClariCGI is to provide access to. Include no entries if you wish to make all groups legal. More typically you will only allow access to certain groups.

Examples:

Newsgroup:	clari 
Newsgroup:	biz.clarinet

Permitted Clients

While you can control access to CGI programs and any part of the web tree with the configuration files of your web server -- and we suggest you do -- you can also control ClariCGI on its own in the configuration file through the use of "Allow" and "Deny" records. These define regular expressions that can match incoming domains, or network masks for incoming IP addresses.

Deny patterns are tried first, since the default is deny, and the only purpose of a "deny" record is to disable a later "Allow" record that would match.

You must be aware of a major security risk in using domain names in this case. Some systems simply do the gethostbyaddr call which can't be trusted. Defeating this call is fairly easy, and a user can make his site seem like it has a name internal to your system. Some systems and web servers counter-verify any names returned by gethostbyaddr with a call to gethostbyname which maps the name back to (hopefully) the same number as a check.

If your site does not do this, then do not trust the domain name and do not use it here. That means, alas, you must make ugly patterns to match number octets.

To use the IP address, which we recommend, use the string ip: followed by the IP network number written as decimal octets, a space and then the number of bits in the subnet mask. That's usually 8 for a class C, 16 for a class B network etc.

If no pattern matches, the default is to deny. (Since the main purpose of this CGI is better access to ClariNet news, you understand that it has been configured by default to not allow leaks.)

Allow: ip:192.54.253 8 
Allow: .*\.yourdomain\.com$

There is actually a more complex syntax for the Allow and Newsgroup headers, contact us if you need to grant different access to different groups to different domains. This is not supported.

(If you want to use your web server's access controls to allow access to ClariCGI, for example to provide password based access, just say,

Allow: ip:0 32

which matches any ip address. You can also use "Allow: ." to match all known and unknown domain names. If you do this be sure to test that your web server is blocking access by non-passworded outsiders.

System URL

You should define this to be your own home page. Error messages about denied permission and a few others will be directed there.

System URL:	http://www.yourdomain.com

News Parameters

If you keep your USENET news spool in other than the default locations of /usr/spool/news and /usr/lib/news, you can define those locations with the "Spool" "Active", and "History" parameters.

In general, for good operation of USENET software, even if you put your news in other places, symlinks from these traditional locations are a good idea. It helps people find things on your system. So, unless you have a good reason define symlinks from the traditional places to your locations.

Overview File

ClariCGI requires an news overview database (NOV). If you are not building overview files in your news system, add this feature immediately. It is almost a necessity for modern newsreaders.

Some sites keep the overview files in their own directory tree, apart from the news articles, for faster access. If you do this, name this directory tree here. Otherwise we look in the ordinary spool directory.

Image caches

If you have ClariNet news, or any other stream that contains lots of MIME-encoded image files, it is efficient to cache these images rather than decode them from their MIME with a CGI every time they are accessed.

ClariCGI does this. For ClariNet picture stories, which come as a pair of articles, one with a small photo and one with a large, you can control the handling of each as well.

To have an image cache, you must set aside a directory which can be written by the same userid that will be running the ClariCGI program. This directory must be inside your web server's tree so that files in it can be accessed directly with URLs.

When a user requests an article with a MIME graphic included, ClariCGI decodes the graphic and stores it in the cache directory, then points to this file with the right URL. If a future user also wishes to see a story with this photo, a direct link is generated to the cached photo.

If you set up such a directory, you must specify both its full path and its path relative to the web server root (usually /usr/local/etc/httpd/htdocs) for use in URLs.

ClariCGI, since it is able to write the cache directory, will once a day (as long as it is invoked once a day) clean out files more than 4 days old from the cache directory. It does this at the end of its run. If you would prefer to do this yourself, define the configuration symbol "No Cleanup", and put in your own cache cleaner. Define the symbol "Expire Days" to set your own expiry limit. For example a line in the crontab of the form:

find /whatever/photo-cache -mtime +5 -print | xargs rm
is an invocation of the find command that removes all files in the named directory that are over 5 days old. You can clean the cache more frequently than that if you wish -- if a file is not in the cache, it is re-generated by ClariCGI the next time it is accessed.

Cache:	/usr/local/etc/httpd/htdocs/images 
Cache URL:		/images

Large Images

Even if you have a cache, you can elect to not cache large ClariNet photos and have them be extracted every time. This is often OK because large photos are not viewed as often, and they take a while to load and display, so the extra overhead of using a CGI to do MIME decoding each time is not a big burden. Thus even if you have defined the "Cache" parameter you will not cache large photos unless you also define the parameter "Large Cacheing" to be some string starting with "y".

Large Cacheing:	yes

CGI security

Many people are concerned, with good reason, about the security of CGI programs provided by others (or even written by their own staff.) A hole in a CGI program can allow any user who is able to run it to possibly gain access to your system. Usually they only get access as the "web" or "nobody" user but that can be dangerous in some cases.

Of course, you may allow ClariCGI to only be run by your local users, but even they might prove malicious, so security is still an issue.

No program of any complexity can be assured to be secure, but of course each author tries his or her best to avoid security holes in programs. ClariCGI is pretty dillegent about checking string lengths before copying into buffers, and never using gets or strcpy or ubound sprintf unless it is known to be safe. It also checks all outside inputs for funny characters or overlong strings.

In order to increase the confidence you may have in ClariCGI, every time it opens a file (other than its configuration file) it does so through a wrapper routine called safe_fopen. This wrapper routine checks the filename to make sure it matches certain parameters. If it doesn't, the file is not opened.

With this mechanism, you can define attributes for files opened by the CGI. This means that if somebody finds a bug in the CGI that lets them have it try to open files on your system outside the files it is supposed to access, these opens will fail.

Of course, they might still find a bug in the CGI that lets them execute random code and totally take it over. Such bugs sometimes happen from buffers that overflow, etc. Nothing can protect you from those. But the safe_fopen scheme protects you from more mundane bugs. (Of course, if there is a bug in the safe_fopen system, you are not protected from that!)

The ClariCGI usually runs with the current directory set to the USENET news spool (/usr/spool/news). As configured by default, safe_fopen refuses to allow any filenames that start with a slash unless they start with particular "allowed" prefixes such as the news spool and the image cache. You may wish to add other directories, such as the personal newspapers directory if you support it. (In fact, you must add such directories if the system is to work.)

The safe_fopen also refuses to open any file that contains a ".." and may be attempting to go up into a parent directory.

Generally the provided configuration for safe_fopen is all you need once you add "File May Start" parameters with your news spool, tempalte directory and personal papers directory (if any). The image cache directory is automatically enabled, and the open of the news history and active files is not checked since the filename provided is static.

Plugs & Customization

You can define the "Provider" string to be some HTML that puts in a plug for your site. A typical value for this is:

Provider: 	[Courtesy <a href=your_url>Your_name_here</a>]

The following configuration variables allow you to insert HTML at various places in the output of ClariCGI. This can also be done on a per-URL basis with the Head and Tail query tags. These tags can be a single line or multiple lines long. Some of the tags are actually programs in the ClariCGI template language.

art head

This string will be printed at the head of any article, after the header and title. Used only in the default article header template.

art tail

This string is output at the tail of any article. You can insert special notes and links to help or graphics as you desire here. Used only in the default article header template.

Article Template

A program in the ClariCGI template language that is the default display template for articles. Explicit templates specified in a query or for a newsgroup or hierarchy will supersede this. See the samples directory for examples of some good article templates.

Inline Divider

A ClariCGI template program that controls what comes between two articles that are included inline from the same query. ClariCGI is able to handle multiple queries and return multiple articles or groups, in sequence, in one response.

List Template

A ClariCGI template program for any newsgroup menu that involves a list of article headlines. (Per group templates have to check if they are in list or inline mode.)

Inline Template

A ClariCGI template program for any newsgroup menu that involves a series of articles included inline in the page.

Group List

The template for a listing of a menu of newsgroups. Usually contains a call to the list function to actually list the matching groups. That function in turn uses the group_listing template.

Here we see an example that will, in combination with the other template, make the listing be a table instead of the default "UL" list style.

# Use database lookup in group menus 
Group List:	{Title( "Newsgroup Menu: ", query_newsgroup )} 
		<H1 align=center>Newsgroup Menu: {query_newsgroup}</H1> 
		<CENTER><TABLE> 
		{list( query_newsgroup );} 
		</TABLE></center>

group_listing

The template for each element in a menu of newsgroups. The default is simply to list groups in an HTML <UL> list, with a link to a ClariCGI call to read the newsgroup.

Below we see a companion template to the one above. This one fills out table rows and fetches group descriptions from the DBM database.

Group Listing:	<TR><TD align=right>{grouplist(List_Newsgroup)} 
		<TD>{data("Short:" . List_Newsgroup);}</TR>

hier_listing

This is like the above template, but called when the item listed is a hierarchy instead of a group.

leftarrow

The HTML for the anchor of the "earlier story" link put at the top of stories. Defaults to an IMG tag of a left arrow provided with ClariCGI

rightarrow

The HTML for the anchor of the "later story" link put at the top of stories. Defaults to an IMG tag of a right arrow provided with ClariCGI

general_toolbar

Template program to output HTML to put in toolbars. In articles, this goes after the arrows. In newsgroups and group menus it goes at the top. A good place to put the HTML to link to your home page, or the newsreading home page, or a help page etc. Of course this toolbar is not used if you have a custom template for one of these pages.

usage

The "usage" template is run when ClariCGI is called with no query. It lets you define the program's usage or welcome message and menu.

Call ClariCGI with no arguments to see the default usage message.

local_page

The "local_page" variable is a string that will be inserted as a menu item in the default usage message, if you don't define one of your own. Handy to link to a local ClariNet page you may have.

Misc

DBM Templates

Define this if ClariCGI is to attempt to find templates for articles in specific groups by checking variables defined in the config file or records in the DBM database. The records will be defined with the pathname of "art.template" files in the news spool directory, relative to its root. For example, a template for all clari.web groups is defined with:

DBM Templates:	yes 
clari/web/art.template:	<HTML><BODY bgcolor=white> 
	{if( H_newsgroups ) 
		"<STRONG>Newsgroups:  </STRONG>"; 
		grouplist( H_newsgroups ); 
		fi } 
	<HR> 
	{art_head} 
	{body(0);}

The DBM database builder, claridb/build.perl will extract all templates from your news spool if given the "+t" option or from the "clari" spool only with "t=clari". However be warned that templates are limited to 1KB in DBM. GDBM does not have that limit. Scanning your entire news spool or overview dir can be quite time consuming.

If you create a private hierarchy for the templates you can put them in the DBM or GDBM database and avoid having to have write permission into the news spool to store the templates. With this feature, ClariCGI can be set up by an ordinary user who has the power to run a CGI and access to the news spool.

No Cleanup

Define this to disable the cleanup of old files from the cache directory by ClariCGI. You must clean them out yourself or the directory will get very large.

Templates

Define to be the name of a directory where template programs are stored. If such a directory is defined, then queries can request that any program from that directory be run as the template for any query. (This can allow users to screw up their pages by running a newsgroup template on an article but this is not a security risk if the programs themselves are OK.) The template language itself is pretty simple, without looping or arrays or pointers, so it's not very dangerous, but promises that any programming language of reasonable ability is safe should be scrutinized with care.

By default there is no templates directory.

The Template Language

The real way to customize pages generated with ClariCGI is via the ClariCGI template language. This lets you lay out complete templates for newsgroup menus, articles and the headline menu items in newsgroup menus.

Note that some of the above variables such as Provider, are simply part of our own predefined templates and they may not be output if you define your own templates.

Summary

Define config file as needed in claricgi.h or use predefined name.
If running FCGI, build FCI developers kit, and modify Makefile to point to library and include files. Otherwise, modify makefile not to use FCGI and type "make nofcgi"
Type make -- install ClariCGI in a place for CGI files on your web server, such as /usr/local/etc/httpd/cgi-bin, or install as FCGI.
Prepare claricgi.conf as per this file, or run ClariCGI Auto-config to build it. Several steps below refer to config decisions.
Decide if you want to cache images. If so, make a directory inside the web tree, make it writable by the userid that will run ClariCGI, and provide a Cache entry in the claricgi.conf.
Add directories you need to access to the "File May Start" parameters in the config file. You only need to do this if you created and defined template or personal paper directories.
Define legal newsgroups and valid IP addresses for access.
Make other modifications to config file and copy it to decided location. This is usually /usr/local/etc/httpd/conf/claricgi.conf
Test some URLs.
Add URLs to your web pages and go!

Group Templates

Define this if ClariCGI is to attempt to find "art.template" files with templates for articles in the spool directories of newsgroups.

No Cleanup

Disable the cleanup of old files from the cache directory by ClariCGI

Template Language

Note that the above variables are simply part of our own predefined templates and they may not be output if you define your own templates.

Summary

Define config file as needed in claricgi.h or use predefined name.
If running FCGI, build FCI developers kit, and modify Makefile to point to library and include files. Otherwise, modify makefile not to use FCGI and type "make nofcgi"
Type make -- install ClariCGI in a place for CGI files on your web server, such as /usr/local/etc/httpd/cgi-bin, or install as FCGI.
Prepare claricgi.conf as per this file, or run ClariCGI Auto-config to build it. Several steps below refer to config decisions.
Decide if you want to cache images. If so, make a directory inside the web tree, make it writable by the userid that will run ClariCGI, and provide a Cache entry in the claricgi.conf.
Add directories you need to access to the "File May Start" parameters in the config file. You only need to do this if you created and defined template or personal paper directories.
Define legal newsgroups and valid IP addresses for access.
Make other modifications to config file and copy it to decided location. This is usually /usr/local/etc/httpd/conf/claricgi.conf
Test some URLs.
Add URLs to your web pages and go!