I wrote a little spreadsheet thingie a few days ago.
Of course, it's a toy, not the real thing at all, but it was a nice hack,
since it is a real, recalculating, extensible, dependency-checking, loop-avoiding
spreadsheet engine in about 50 lines of code.
That's because I was using Python, which is a seriously cool language to write that
kind of thing in, since all you have to do to evaluate an expression is call eval()
on it.
Sure, that's nice, but the real core of the spreadsheet engine was that you could
also create a dictionary-like object that recalculated on-demand its contents.
That way, when you ask for sheet['a1'], custom code goes to see what a1 has in it
(a formula), calculates it if needed, and maybe trigger a few extra recalculations
if another cell depends on a1.
But as anyone who uses spreadsheets can tell you, weird things exist in ssheet land.
For example, if you copy something, then you paste it, it gets modified in the process.
What other app does that???
Here's an example you can check in any spreadsheet:
In A1, type "1".
In B1, type "A1+1" (should display 2)
In A2, type 2
Copy B1 to B2, and it will display 3
Further, if you look at the formula in B2, it says A2+1 now.
That's called relative cellnames (I think).
In order to do that trick, you have to parse the formula in B1, and then, when you paste it into B2,
take into account the displacement and modify accordingly. Usually, if you want absolute names, you
use $ A1 instead, and that would stay unmodified.
Now, that throws a nice monkeywrench into my neat little spreadsheet because now it suddenly
looks not like a spreadsheet at all!
So, I started thinking, how the hell could this be done? The whole advantage of a python sheet is
using eval(), so switching to a parser (like if this were a C[++] sheet) would be silly.
I delved into the python standard lib. As every python programmer knows, almost everyhting is there.
If you write python, you read the library reference every day, and memorize chunks of it, because
it's one of the things that make python cool. It's just chockfull of useful stuff!
And here I was reading about the compiler module, and the parser module, which can be used to do
wondrous stuff with python code. But I couldn't understand jackshit about them. I'm a simple coder.
And just as I was going to say, let's write instead about the connection between free software and
the sex life of frogs I found tokenize.
Tokenize is a module that parses python and turns it into tokens. Here's how a+2 looks after you
tokenize it:
1,0-1,1: NAME 'a'
1,1-1,2: OP '+'
1,2-1,3: NUMBER '2'
2,0-2,0: ENDMARKER ''
The numbers on the left side are positions in the text stream where the tokens were.
It has just enough information that you can tokenize a piece of code, and then reassemble it. There's
code to do just that, it's called regurgitate and it's written by Ka-Ping Yee.
So, the solution is obvious. When copying a formula:
Tokenize the formula to be copied
Look for tokens of type NAME
See if it looks like a cellname, or _cellname
If it's _cellname, leave as is. That will be our notation for absolute cells
If it's cellname, displace it nicely
Regurgitate it
Later, when evaluating a formula, if someone asks for cell _a1 give him cell a1.
And voilà, relative cells.
This works, and it works well (ok, I had to introduce some ugly globals, I need to learn more stuff),
and it is guaranteed to tokenize in the same way python does it. It's not even really slow
I touched a bunch of other things, including support for all the functions in python's math module so
you can use them in cells. Here's the code to do that:
for name in dir(math):
if name[0]<>"_":
self.tools[name]=eval('math.'+name)
Freaky stuff, isn't it?
What's the main issue? Performance. To put it simply, I seriously doubt a sheet written in python
can be fast enough for general use. But hey, it's extensible, it's nice, and depending on what you
are trying to do, it may be good enough.
And here's today's version of StupidSheet including relative cells. Don't worry, it's a small download ;-)