the elj project
Open Source Eiffel Libraries and Applications(SmartEiffel and ISE Eiffel)
The Perlish Library
In the Spring of 2001 I was given the assignment of conducting a book study. The focus of the study was to teach Perl using the book Learning Perl (we were using the second edition. The recent third edition has a much gentler introduction to Perl in chapter one.)
As I read through chapter one, I saw that many of the examples could be easily written in Eiffel. I started doing this using the SmallEiffel compiler. It didn't take long for me to realize that with clever use of Eiffel infix and prefix operators, I could replicate some of Perl's operators. I began with the diamond operator (<>). After that I layered a little code over the PCRE library from the ELJ project to get some of Perl's pattern matching capabilities. And so began my Perlish library.
Before long I hit a wall. That wall was on page 20, where the example program creates a child process, redirects the child's standard input, and writes text to the child. (The child is a mail program.) Another wall stood a couple pages later: Perl's simple formatted output.
Disgruntled, I set aside Perlish and got on with the book study. Then Dominique announced agent support in a new Beta release of SmallEiffel. With agents, I could implement something that worked much like Perl's format directives. So I picked up where I left off. This time I was determined to see if I could implement an Eiffel version of every major feature demonstrated in chapter one. That chapter culminates in three example programs that take advantage of DBMs, hashes, child processes with redirected I/O, formatted output, and text manipulation. The Perlish library lets me implement all three. The three programs are say_hello, last and lister.
I added a fourth program, fake_mail, which replaces the mail program expected by the say_hello example, since I work mostly on NT and I don't have a command-line mail program lying around.
Source Code Size
Depending on how one counts these things, the Eiffel versions are only a little larger than their Perl counterparts. When writing Eiffel I have a tendency to throw in all kinds of assertions. This makes the code larger, yet aids enormously in debugging and has no effect on the optimized versions of the programs. Also, part of the Eiffel code is taken up with variable declarations that the Perl examples avoid. Personally I always use Perl's "strict" feature. If I had written the original Perl examples, they'd include variable declarations and the difference in code size would be even smaller.
Looking at just executable lines of code, the Eiffel versions are about 15% larger than the Perl versions, which isn't bad at all, given Perl's reputation for terseness and Eiffel's reputation for verbosity. Some of this increase is accounted for by Eiffel's notorious 'minimalist' loop construct. It would be possible to cut the eiffel code down further by using agents to iterate over lists of data, but this would involve noticeable restructuring of the programs . (As a note, I did implement a version of say_hello that uses agents and removes all of the assertions. This version is about 25% smaller than than the original and very close to the size of the Perl counterpart.)
There are four classes in the Perlish library that you'll be interested in using. These are:
Inherit from this class to gain access to the Perlish operators.
A pin-headed version of Perl hashes. This is a subclass of DICTIONARY[STRING, STRING] with some convenience functions for loading simple data to/from a text file. This is meant to emulate Perl's DBM functionality. Here is the short/flat version of HASH, which tends to drown out the new features with all the features of DICTIONARY.
A mechanism for providing formatted output. It makes use of Eiffel's agent feature.
Here's a cool thing: would you like to be able to spawn a child process and read to or write from it? This class allows you to do just that. One of its constructors supports the Perl convention of being able to specify the command line with "|" characters to denote whether input or output (or both) is being redirected to the parent.
SmallEiffel Release -0.74 Beta 10 or later. (Note that with SmallEiffel's numbering scheme, release -0.75 is not a later version.)
The ELJ-Win32 distribution.
This version of Perlish is specific
to Microsoft Windows. The elj-win32 SmallEiffel distribution can be
downloaded from here:
I did put in some hooks for portability, and if
circumstances dictate, I could create a Unix version.
I did put in some hooks for portability, and if circumstances dictate, I could create a Unix version.
I'm assuming that anyone interested in Perlish is already a Perl user who has Perl 5.6 or greater installed. If not, then you'll have to compile and test the example programs manually (see below).
I've written several example programs that exercise the main classes in the library. To test these programs, I've put together Perl scripts that will compile and run tests for each example. The scripts are all named "rt.pl" and are very simple.
The most interesting and complex examples are the programs that implement my version of the Perl programs cited above. Since it's difficult or impossible to automate tests for these examples, the rt.pl script for this project only compiles the programs.
To run all the tests, go to Perlish\examples and run the rt.pl script there. This calls all the rt.pl scripts in each subdirectory and will write out errors for any compilations or tests that fail.
Abusing Infix Operators
Achieving operators similar to Perl's requires a little cheating. For example in Perl (and many other languages) its perfectly acceptable to call a function and throw away the result. Not so in Eiffel. Eiffel assumes that if you called a function, then you must be interested in the result. It also assumes that all infix or prefix operators act as functions, returning a result. It's OK to call a function, assign its result to a variable and then ignore that variable (at least it is for the compiler.)
These restrictions encourage Command Query Separation (CQS). CQS is generally a foreign concept for Perl programming, which at heart is a procedural language; CQS is hard to actualize in procedural code since there are no objects that maintain state between commands and queries.
CQS also assumes that as a rule, functions or features that return values don't have side effects. The notion of a function without side effects is completely counter to the Perl (or C or C++) way of doing things. Some of the operators I've implemented merely act as a facade that combine separate command and query features. Thus a CQS Purist should be able to use Perlish without having to compromise their principles -- too much. (This is an implementation goal I've not strictly adhered to, but should. I hope to do so in a later version of Perlish, assuming there is such a thing.)
As an aside, the phrase "CQS" is a little misleading. It gives the impression that code should be structured as a series of commands followed by queries -- the "do this. Did it work?" approach. However its perfectly acceptable to have queries precede commands, leading to the form "Can I do this? Then let's do it." It's also possible to write neurotic code -- "Can I do this? Then do it. Did it work? Are you sure?..." If neurotic code is written to take advantage of Eiffel's rich set of assertions, then you may get something that looks more like Design by Contract than hysteria.
Another problem is that infix and prefix operators must operate on "their" object. For example an operator that works solely on STRINGs must be declared in the STRING class, and can only be invoked on a STRING object. There's no analog to the C++ concept of a free-standing operator declaration.
Since the operators are declared as features of class PERLISH, they need a PERLISH operator as one of their parameters. Since PERLISH clients inherit the class to get its functionality, you could use Current, but I added a feature named "p" since it seemed more -- umm -- Perlish.
Operators that don't take parameters (such as the diamond operator) are prefixes to "p." This means they are of the form "op p." This diamond operator example would be
Or, more appropriately, in a context where the return value is used (and a compiliation error avoided)
if #<> p /= void then
Operators that take parameters are infix, with the form "p op parameter". Infix operators are restricted to taking only one parameter. To get around that restriction, the parameter can be a manifest array or a tuple. I prefer to use tuples (the syntax is a little cleaner, which on second thought seems un-Perlish) but I've had troubles with tuples, so I often use manifest arrays.
It would be interesting to extend operator syntax to be more flexible, allowing free operators, operators that aren't functions, parameterless operators, and so forth. I am not pushing strongly for this, since its unlikely that Eiffel as a whole would benefit from this flexibility.
Another interesting problem I've encountered with infix operators: not all Eiffel tools are well-equipped to handle odd sequences of characters. For example, the operator declaration
can cause problems, because the percent sign is an escape character in Eiffel's literal strings.
Development time. Anyone who works with Perl is certainly familiar with the Perl debugger. One can spend hours debugging cryptic and mysterious Perl constructs that aren't working for mysterious and cryptic reasons. This is especially true when one is trying to write large-scale Perl scripts. (To me, a "large-scale Perl script" is anything over 10 lines :-) On the other hand, it can be very easy to get an Eiffel program that takes full advantage of the assertions and Design by Contract debugged and running. And once you're confident that the program's running, you can turn off the assertions, optimize the program and let it rip.
Library development. The rich libraries available for Perl is a testament to the dedication of Perl programmers everywhere. I consider Perl library development to be unusally challenging, especially if you're trying to develop that library as a Perl class.
Everything in Eiffel is a class, and indeed the language provides enormous support for developing good class interfaces.
Performance. I work a lot with large structured text files. Once I needed to find a few records that had a particular pattern in a particular field in a file that was just over1G in size. My first attempt to do this with Perl didn't work. Perl's memory consumption grew to the limits of the PC (considerably less than 1G) and then crashed.
I re-implemented the search script with Perlish and had my answer six minutes later.
The Compilation Cycle.
To Be Done
Of course I'd like to continue extending Perlish, and will do so for my own purposes. If enough people express substantial interest (ie "show me the money") for certain extensions, I'll work on those as well.
Formatting is obviously incomplete. There's no support for formatted floating point numbers, multiline fields, and filled fields (where one variable's content may populate multiple fields) should be supported.
When SmallEiffel provides support for setting environment variables, I'd like to implement an equivalent to %ENV.
I could replace the rt.pl test scripts with an Eiffel version today, but this must wait until I decide how best to do it and have the time.
|``.. in open source, software lives on if there are enough believers to keep it alive ..'' (WSJ - 20 Jul 2003)|
Dec 04, 2003, 00:26 UTC