Welcome to the Matter Realisations' Software Page!
Mostly as a means towards an end, a fair amount of software
gets written. A great deal of this is written in perl, and
may require modules from CPAN (the Comprehensive Perl Archive
Network). For the most part, this code is independent of the
operating system. It should run on Windows, although it may
run better under the Cygwin environment on Windows, than
fully native (ActivePerl).
Support
I am not terribly interested in supporting any of this
software, certainly there is a limit as to what I will do for
free. I make no guarantees and provide no warranties as to
fitness for service. If the programs help you, that's great.
If not, you are welcome to find something else or edit the
code.
Questions about perl source programs should go to
perl@materialisations.com.
The first line of the file and the Subject field should both
mention the program/problem, and the topic of the entire
email should be perl related.
Questions about C source programs should go to
c.src@materialisations.com.
The first line of the file and the Subject field should both
mention the program/problem, and the topic of the entire
email should be C or C++ language related.
And so on for questions concerning programs written in other
programming languages.
Debugging Perl
Debugging perl? I use emacs for a text editor, and perldb in
emacs for debugging. Most of you probably invoke your perl
scripts with -w and 'use strict;', 'use diagnostics;' is also
sometimes useful. Another is 'use Data::Dumper;', just in
case you need to print data structures. But, I have run
across data structures generated by perl modules that are
VERY large!
The perl I write tends to not use any tricks, as it can be
hard to read many days/weeks/months after it was written. It
also tends to look like C, and usually has lots of comments.
For instance, in order to extract the program name from the
command line (for usage messages), I like:
my @path = split( m|/|, $0 );
my $progname = $path[-1];
If you are employing either BEGIN or END blocks in your
program, you might want to insert a:
$DB::single = 1;
statement at the beginning of the blocks. This will cause
perldb to stop compiling at that point and allow you to examine
statement execution within those BEGIN and END blocks.
Software
On to the software, in no particular order.
-
xconv2.pl
-
Inetd is a networking services super server. It
listens to a long list of port numbers on an interface for
activity. If it notices a connection has been made on a
particular port number, it "spawns" the appropriate daemon
to service the request being made. Xinetd is
almost a replacement for inetd that has some nice
features and may be more secure. Support and
development of xinetd is curious. On Debian Linux,
xinetd.conf modification by debconf is not done. The only
tools seem to be to keep inetd installed (in fact, it's
preferred over xinetd for RPC services), let debconf update
inetd.conf, and then manually convert the inetd.conf file
into an xinetd.conf. Somewhere along the way, my inetd.conf
file got corrupted and the conversion tool I used
(xconv.pl) produces broken xinetd.conf files from broken
inetd.conf files. So, I wrote a conversion tool which
cleans up the problems I had seen. It is an update to
xconv.pl, but is quite a bit more involved. Since I wrote
this program, Debian has moved from the ordinary inetd to
the Open-BSD one. It probably isn't worth moving to xinetd,
with it seeming to be about half finished and not a lot of
active development taking place.
-
xinetd_admin.pl
-
I guess one of the reasons why Debian isn't administering
xinetd.conf from debconf, is that there are no command line
tools available for making changes to xinetd.conf. This is
such a tool. It is somewhat schizophrenic in that while it
remembers the order in which comments are read, it doesn't
exactly remember where the comments were. So, comments can
get moved around, and formatting may get changed. Services
can be commented out or re-activated on the command line,
individual capabilities of a service can be dropped, added
or changed, and new services can be added.
-
Periodic
-
Quick and dirty program to generate a SVG format of the
periodic table. It modifies an image first generated in
Sodipodi.
-
XML Catalogs
-
XML on "leaf" nodes seems to function best if there are a
good set of "catalogs" online to be used for finding DTDs,
Schemas, etc. when processing XML based languages. I run
Debian-Linux, and as of 2003-10, they have still not come
up with any policy as far as how catalogs should be built,
maintained, etc. So, I started to build my own policy. This
program is the result. It isn't perfect, and it makes the
assumption that most of the applications using the catalog
are involved with DocBook-XML. However, if your needs are
larger than that, I don't think you'll have too many
problems working with what I have generated. It is still a
work in progress.
-
clean_mbox
-
The mbox format for email is an old format. And it has
problems. This program uses Mail::Folder and Mail::Internet
from CPAN to help clean a mbox. Mostly it just tries to
reformat individual messages. The various Perl/CPAN modules
I've played with can't enforce 78 character per line
headers and 998 character per line bodies. But, this still
does a better job than most email clients seem to send.
-
mbox_anal
-
The mbox format, and RFC822 email in general, seems
sufficiently complex that if you wait long enough you will
be sure to receive email that confuses or breaks some
program that works with email. This program counts the
number of lines longer than 78 characters, the number
longer than 998 characters, and looks for non-standard
End-Of-Line behavior. As I read RFC822, all lines should
end CRLF. Some UNIX clients may send bare LF, some Macs may
send bare CR, QNX may send bare RS, and then there is this
EBCDIC NEL thing I guess from some mainframes. If a line
has none of this, it is just strange.
-
mergebox.pl
-
I have email in multiple places, and it is usually in mbox
format (although I know I should move to maildir).
Inevitably, I'll end up with duplicate copies of messages.
This program works best if the original program you has a
Message-ID header. It reads one or more mbox (at least the
destination mail folder must be mbox), a message at a time,
keeping a hash of the Message-IDs. If a message has no
Message-ID, a MD5 checksum is used as one. If the message
is new, it is appended to the destination mbox. If there is
a Message-ID collision, it assumes the larger message is
the original and marks the shorter for deletion (or doesn't
copy it, if not already in the destination box). I really
should strip off the "From " envelope header before
calculating a MD5 if need be.
-
msort.pl
-
Dates in RFC822 email are another source of fun. The
alphabetical timezone format is not unique (for example,
there is more than 1 CST). Some email clients don't put
timezone information in a header. There should be a
Received header inserted by our SMTP/POP/IMAP server, which
will have the time it was received (unless this server is
poorly configured). The timezone should at least remain
constant for this location of a date. This program by
default believes the Date header, but can also find the
earliest or latest Received header date. I suppose a person
could add code to try and recognize the header that your
email server is inserting, and just grab that one. The
output is to (multiple) mbox with names that have dates in
them. So, you can split a collection of mbox into years, or
years and months.
-
slow_down.pl
-
Ever receive a file of ASCII graphics which just ran too
fast to catch? I got one about cooking a turkey which was
like that, a long time ago. These days, a person should
probably use the high-resolution timer in perl, but back
then I just repeated an assignment in a loop (thankfully it
wasn't optimised away) some number of times before printing
the next character. It worked.
-
download.pl
-
This one needs work to be used by anyone else. I was
getting 100-200 spams per day, and was about to go on a
trip (soccer tournament). I would have been gone long
enough, that I likely would have been over-allocation on my
email account. It was operated for the duration of the
trip, in a mode where it didn't try to shuffle messages
amongst folders holding various ages of messages. I was on
a dialup account, so cron would run this program at 3am to
start up my PPP connection, download email to a set of
folders, and then shutdown the PPP connection. It requires
another program (rmmail.pl) to actually do the downloading.
-
rmmail.pl
-
Companion program to download.pl, big program. Tons of
bells and whistles, and probably bugs too. This program
either works with remote mail via a POP server, an IMAP
server or FTP. [Note: POP servers can be configured such
that any reading of a message triggers its deletion from
the queue/folder on the server. So, you better be sure you
have a local copy of any message (or be sure it is spam).]
The POP routines from CPAN are pretty nice, IMAP is clunky,
and the FTP code probably only works in the event that the
remote site is using mbox format. Messages are appended to
mbox locally, so if you run this periodically you'll
probably end up with multiple copies in the mbox.
Spamassassin is used to try and characterize email as spam
or not. Because Mail::Audit assumes only a single
destination event for any message, I "clone" the objects
before giving them to Mail::Audit. The program attempts to
keep a small cache of previously seen messages, so that it
doesn't download things over and over again. This probably
isn't working as well as it could be. This program also
uses clamav to scan for virii in incoming messages, and
puts any such messages in a special folder. This is the
first analysis done. To date, I haven't seen any
contaminated messages, so I don't know if this is working.
Next, I receive some mail from remote computers of a known
address, subject line, etc. These are directed to a special
folder before spam analysis. Mail from mailing lists, and
from special people (Hi Mom!) are similarly redirected
before analysing for spam. The mailing list processing is
via the CPAN Mail::ListDetector module. After that, I look
to see if the originating address happens to be one of my
own other remote addresses, if so put it in my sent-mail
folder. Some people (spammers, etc.) send email from the
same address over and over, so just trash that stuff. Could
be an entire domain (say goodbye to .kr) if you like.
Finally we test for it being spam with spamassassin. This
(and probably the virus scan) are the slowest parts of the
processing. Using the swiss army chainsaw (Date::Manip) for
date processing is hardly even a blip on processing speed
in comparison. And then we have a bunch of processing for
other mailing lists, people, strange email (Apparently-To
or an emptied BCC header most likely?), etc.
-
TXDB - try3.pl
-
Sorry, no link for this one. You'll have to ask for it
specifically if you want it. This an attempt at writing a
nice UI to allow people to generate a database of
information for use with the XMLResume Library. It does not
follow XMLResume completely; some because this hasn't been
worked on in a while, and some because I disagree with
parts of XMLResume philosophy. The UI used is the Curses
Development Kit, and is a bit clunky/sensitive. Because
curses is used, this is a Text User Interface (TUI).
-
xmlresume4.pl
-
More or less, a Perl/Tk version of TXDB. It also needs a
lot more work, but I think its a better start than TXDB.
-
biblio-tk3.pl
-
If we get into a "technical" occupation, we really should
keep track of everything (of consequence) we read. Probably
starting from the beginning of university. It sounds easy,
just pick up some kind of bibliography database program,
and start using it. Most of these things are either based
on the records kept by an important library (for example,
the U.S. Library of Congress) or on the requirements for
including references in publications. There always seems to
be something missing for "my application". If it is a book
I own, I probably want to keep track of when I bought it
and for how much. Possibly where I bought it. Is it
autographed? Am I keeping any important information inside
the book (root password?)? What page? If I have this data
in a database, I want to be able to easily search for the
best sources to reference things in later. So, not only do
I want the title and authors/ editors/ translators, I want
the titles of Parts, titles of Appendices of Parts, titles
of Chapters, titles of Appendices of Chapters, subchapters,
paragraphs, etc. Anyway, this program is my take on this
idea written in Perl/Tk. If anyone knows how to speed up
the Tk part of displaying the various popup menu's, I would
be interested.
-
html_it4.pl
-
It may be useful to generate various views of the above
bibliography data. The approach taken here is to calculate
various HTML pages based on the data, and present that
along with various statistics. The CPAN module CGI.pm is
used to generate most of the XHTML.
-
biblio-helper.pl
-
There are places on the Internet where you can query
bibliographic databases for various things. The U.S.
Library of Congress and Amazon.com are 2 popular examples.
This program is mostly concerned with USLOC and its MARC
data, and uses ANSI Z39.50 to get the data. I haven't used
this much, so I don't know what its limitations are.
Home?
Want to learn about this
page?