Matter Realisations

Welcome to the Matter Realisations' Software Page!

Mostly as a means towards an end, a fair amount of software gets written. A great deal of this is written in perl, and may require modules from CPAN (the Comprehensive Perl Archive Network). For the most part, this code is independent of the operating system. It should run on Windows, although it may run better under the Cygwin environment on Windows, than fully native (ActivePerl).

Support

I am not terribly interested in supporting any of this software, certainly there is a limit as to what I will do for free. I make no guarantees and provide no warranties as to fitness for service. If the programs help you, that's great. If not, you are welcome to find something else or edit the code.

Questions about perl source programs should go to perl@materialisations.com. The first line of the file and the Subject field should both mention the program/problem, and the topic of the entire email should be perl related.

Questions about C source programs should go to c.src@materialisations.com. The first line of the file and the Subject field should both mention the program/problem, and the topic of the entire email should be C or C++ language related.

And so on for questions concerning programs written in other programming languages.

Debugging Perl

Debugging perl? I use emacs for a text editor, and perldb in emacs for debugging. Most of you probably invoke your perl scripts with -w and 'use strict;', 'use diagnostics;' is also sometimes useful. Another is 'use Data::Dumper;', just in case you need to print data structures. But, I have run across data structures generated by perl modules that are VERY large!

The perl I write tends to not use any tricks, as it can be hard to read many days/weeks/months after it was written. It also tends to look like C, and usually has lots of comments. For instance, in order to extract the program name from the command line (for usage messages), I like:

  my @path     = split( m|/|, $0 );
  my $progname = $path[-1];

If you are employing either BEGIN or END blocks in your program, you might want to insert a:

  $DB::single = 1;
statement at the beginning of the blocks. This will cause perldb to stop compiling at that point and allow you to examine statement execution within those BEGIN and END blocks.

Software

On to the software, in no particular order.

xconv2.pl
Inetd is a networking services super server. It listens to a long list of port numbers on an interface for activity. If it notices a connection has been made on a particular port number, it "spawns" the appropriate daemon to service the request being made. Xinetd is almost a replacement for inetd that has some nice features and may be more secure. Support and development of xinetd is curious. On Debian Linux, xinetd.conf modification by debconf is not done. The only tools seem to be to keep inetd installed (in fact, it's preferred over xinetd for RPC services), let debconf update inetd.conf, and then manually convert the inetd.conf file into an xinetd.conf. Somewhere along the way, my inetd.conf file got corrupted and the conversion tool I used (xconv.pl) produces broken xinetd.conf files from broken inetd.conf files. So, I wrote a conversion tool which cleans up the problems I had seen. It is an update to xconv.pl, but is quite a bit more involved. Since I wrote this program, Debian has moved from the ordinary inetd to the Open-BSD one. It probably isn't worth moving to xinetd, with it seeming to be about half finished and not a lot of active development taking place.
xinetd_admin.pl
I guess one of the reasons why Debian isn't administering xinetd.conf from debconf, is that there are no command line tools available for making changes to xinetd.conf. This is such a tool. It is somewhat schizophrenic in that while it remembers the order in which comments are read, it doesn't exactly remember where the comments were. So, comments can get moved around, and formatting may get changed. Services can be commented out or re-activated on the command line, individual capabilities of a service can be dropped, added or changed, and new services can be added.
Periodic
Quick and dirty program to generate a SVG format of the periodic table. It modifies an image first generated in Sodipodi.
XML Catalogs
XML on "leaf" nodes seems to function best if there are a good set of "catalogs" online to be used for finding DTDs, Schemas, etc. when processing XML based languages. I run Debian-Linux, and as of 2003-10, they have still not come up with any policy as far as how catalogs should be built, maintained, etc. So, I started to build my own policy. This program is the result. It isn't perfect, and it makes the assumption that most of the applications using the catalog are involved with DocBook-XML. However, if your needs are larger than that, I don't think you'll have too many problems working with what I have generated. It is still a work in progress.
clean_mbox
The mbox format for email is an old format. And it has problems. This program uses Mail::Folder and Mail::Internet from CPAN to help clean a mbox. Mostly it just tries to reformat individual messages. The various Perl/CPAN modules I've played with can't enforce 78 character per line headers and 998 character per line bodies. But, this still does a better job than most email clients seem to send.
mbox_anal
The mbox format, and RFC822 email in general, seems sufficiently complex that if you wait long enough you will be sure to receive email that confuses or breaks some program that works with email. This program counts the number of lines longer than 78 characters, the number longer than 998 characters, and looks for non-standard End-Of-Line behavior. As I read RFC822, all lines should end CRLF. Some UNIX clients may send bare LF, some Macs may send bare CR, QNX may send bare RS, and then there is this EBCDIC NEL thing I guess from some mainframes. If a line has none of this, it is just strange.
mergebox.pl
I have email in multiple places, and it is usually in mbox format (although I know I should move to maildir). Inevitably, I'll end up with duplicate copies of messages. This program works best if the original program you has a Message-ID header. It reads one or more mbox (at least the destination mail folder must be mbox), a message at a time, keeping a hash of the Message-IDs. If a message has no Message-ID, a MD5 checksum is used as one. If the message is new, it is appended to the destination mbox. If there is a Message-ID collision, it assumes the larger message is the original and marks the shorter for deletion (or doesn't copy it, if not already in the destination box). I really should strip off the "From " envelope header before calculating a MD5 if need be.
msort.pl
Dates in RFC822 email are another source of fun. The alphabetical timezone format is not unique (for example, there is more than 1 CST). Some email clients don't put timezone information in a header. There should be a Received header inserted by our SMTP/POP/IMAP server, which will have the time it was received (unless this server is poorly configured). The timezone should at least remain constant for this location of a date. This program by default believes the Date header, but can also find the earliest or latest Received header date. I suppose a person could add code to try and recognize the header that your email server is inserting, and just grab that one. The output is to (multiple) mbox with names that have dates in them. So, you can split a collection of mbox into years, or years and months.
slow_down.pl
Ever receive a file of ASCII graphics which just ran too fast to catch? I got one about cooking a turkey which was like that, a long time ago. These days, a person should probably use the high-resolution timer in perl, but back then I just repeated an assignment in a loop (thankfully it wasn't optimised away) some number of times before printing the next character. It worked.
download.pl
This one needs work to be used by anyone else. I was getting 100-200 spams per day, and was about to go on a trip (soccer tournament). I would have been gone long enough, that I likely would have been over-allocation on my email account. It was operated for the duration of the trip, in a mode where it didn't try to shuffle messages amongst folders holding various ages of messages. I was on a dialup account, so cron would run this program at 3am to start up my PPP connection, download email to a set of folders, and then shutdown the PPP connection. It requires another program (rmmail.pl) to actually do the downloading.
rmmail.pl
Companion program to download.pl, big program. Tons of bells and whistles, and probably bugs too. This program either works with remote mail via a POP server, an IMAP server or FTP. [Note: POP servers can be configured such that any reading of a message triggers its deletion from the queue/folder on the server. So, you better be sure you have a local copy of any message (or be sure it is spam).] The POP routines from CPAN are pretty nice, IMAP is clunky, and the FTP code probably only works in the event that the remote site is using mbox format. Messages are appended to mbox locally, so if you run this periodically you'll probably end up with multiple copies in the mbox. Spamassassin is used to try and characterize email as spam or not. Because Mail::Audit assumes only a single destination event for any message, I "clone" the objects before giving them to Mail::Audit. The program attempts to keep a small cache of previously seen messages, so that it doesn't download things over and over again. This probably isn't working as well as it could be. This program also uses clamav to scan for virii in incoming messages, and puts any such messages in a special folder. This is the first analysis done. To date, I haven't seen any contaminated messages, so I don't know if this is working. Next, I receive some mail from remote computers of a known address, subject line, etc. These are directed to a special folder before spam analysis. Mail from mailing lists, and from special people (Hi Mom!) are similarly redirected before analysing for spam. The mailing list processing is via the CPAN Mail::ListDetector module. After that, I look to see if the originating address happens to be one of my own other remote addresses, if so put it in my sent-mail folder. Some people (spammers, etc.) send email from the same address over and over, so just trash that stuff. Could be an entire domain (say goodbye to .kr) if you like. Finally we test for it being spam with spamassassin. This (and probably the virus scan) are the slowest parts of the processing. Using the swiss army chainsaw (Date::Manip) for date processing is hardly even a blip on processing speed in comparison. And then we have a bunch of processing for other mailing lists, people, strange email (Apparently-To or an emptied BCC header most likely?), etc.
TXDB - try3.pl
Sorry, no link for this one. You'll have to ask for it specifically if you want it. This an attempt at writing a nice UI to allow people to generate a database of information for use with the XMLResume Library. It does not follow XMLResume completely; some because this hasn't been worked on in a while, and some because I disagree with parts of XMLResume philosophy. The UI used is the Curses Development Kit, and is a bit clunky/sensitive. Because curses is used, this is a Text User Interface (TUI).
xmlresume4.pl
More or less, a Perl/Tk version of TXDB. It also needs a lot more work, but I think its a better start than TXDB.
biblio-tk3.pl
If we get into a "technical" occupation, we really should keep track of everything (of consequence) we read. Probably starting from the beginning of university. It sounds easy, just pick up some kind of bibliography database program, and start using it. Most of these things are either based on the records kept by an important library (for example, the U.S. Library of Congress) or on the requirements for including references in publications. There always seems to be something missing for "my application". If it is a book I own, I probably want to keep track of when I bought it and for how much. Possibly where I bought it. Is it autographed? Am I keeping any important information inside the book (root password?)? What page? If I have this data in a database, I want to be able to easily search for the best sources to reference things in later. So, not only do I want the title and authors/ editors/ translators, I want the titles of Parts, titles of Appendices of Parts, titles of Chapters, titles of Appendices of Chapters, subchapters, paragraphs, etc. Anyway, this program is my take on this idea written in Perl/Tk. If anyone knows how to speed up the Tk part of displaying the various popup menu's, I would be interested.
html_it4.pl
It may be useful to generate various views of the above bibliography data. The approach taken here is to calculate various HTML pages based on the data, and present that along with various statistics. The CPAN module CGI.pm is used to generate most of the XHTML.
biblio-helper.pl
There are places on the Internet where you can query bibliographic databases for various things. The U.S. Library of Congress and Amazon.com are 2 popular examples. This program is mostly concerned with USLOC and its MARC data, and uses ANSI Z39.50 to get the data. I haven't used this much, so I don't know what its limitations are.

Home?
Want to learn about this page?