Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

This is why dynamic languages are cool

I wrote a lit­tle spread­sheet thingie a few days ago. [1]

Of course, it's a toy, not the re­al thing at al­l, but it was a nice hack, since it is a re­al, re­cal­cu­lat­ing, ex­ten­si­ble, de­pen­den­cy-check­ing, loop-avoid­ing spread­sheet en­gine in about 50 lines of code.

That's be­cause I was us­ing Python, which is a se­ri­ous­ly cool lan­guage to write that kind of thing in, since all you have to do to eval­u­ate an ex­pres­sion is call eval() on it.

Sure, that's nice, but the re­al core of the spread­sheet en­gine was that you could al­so cre­ate a dic­tio­nary-­like ob­ject that re­cal­cu­lat­ed on-de­mand its con­tents.

That way, when you ask for sheet['a1'], cus­tom code goes to see what a1 has in it (a for­mu­la), cal­cu­lates it if need­ed, and maybe trig­ger a few ex­tra re­cal­cu­la­tions if an­oth­er cell de­pends on a1. [2]

But as any­one who us­es spread­sheets can tell you, weird things ex­ist in ssheet land.

For ex­am­ple, if you copy some­thing, then you paste it, it gets mod­i­fied in the process.

What oth­er app does that???

Here's an ex­am­ple you can check in any spread­sheet:

  • In A1, type "1".

  • In B1, type "A1+1" (should dis­­­play 2)

  • In A2, type 2

  • Copy B1 to B2, and it will dis­­­play 3

Fur­ther, if you look at the for­mu­la in B2, it says A2+1 now.

That's called rel­a­tive cell­names (I think).

In or­der to do that trick, you have to parse the for­mu­la in B1, and then, when you paste it in­to B2, take in­to ac­count the dis­place­ment and mod­i­fy ac­cord­ing­ly. Usu­al­ly, if you want ab­so­lute names, you use $ A1 in­stead, and that would stay un­mod­i­fied.

Now, that throws a nice mon­key­wrench in­to my neat lit­tle spread­sheet [3] be­cause now it sud­den­ly looks not like a spread­sheet at al­l!

So, I start­ed think­ing, how the hell could this be done? The whole ad­van­tage of a python sheet is us­ing eval(), so switch­ing to a pars­er (like if this were a C[++] sheet) would be sil­ly.

I delved in­to the python stan­dard lib. As ev­ery python pro­gram­mer knows, al­most ev­ery­ht­ing is there. If you write python, you read the li­brary ref­er­ence ev­ery day, and mem­o­rize chunks of it, be­cause it's one of the things that make python cool. It's just chock­full of use­ful stuff!

And here I was read­ing about the com­pil­er mod­ule, and the pars­er mod­ule, which can be used to do won­drous stuff with python code. But I could­n't un­der­stand jack­shit about them. I'm a sim­ple coder.

And just as I was go­ing to say, let's write in­stead about the con­nec­tion be­tween free soft­ware and the sex life of frogs [4] I found to­k­enize.

To­k­enize is a mod­ule that pars­es python and turns it in­to to­ken­s. Here's how a+2 looks af­ter you to­k­enize it:

1,0-1,1:        NAME    'a'
1,1-1,2:        OP      '+'
1,2-1,3:        NUMBER  '2'
2,0-2,0:        ENDMARKER       ''

The num­bers on the left side are po­si­tions in the text stream where the to­kens were.

It has just enough in­for­ma­tion that you can to­k­enize a piece of code, and then re­assem­ble it. There's code to do just that, it's called re­gur­gi­tate and it's writ­ten by Ka-Ping Yee.

So, the so­lu­tion is ob­vi­ous. When copy­ing a for­mu­la:

  • To­k­­enize the for­­mu­la to be copied

  • Look for to­kens of type NAME

  • See if it looks like a cel­l­­name, or _cel­l­­name

  • If it's _cel­l­­name, leave as is. That will be our no­­ta­­tion for ab­­so­­lute cells

  • If it's cel­l­­name, dis­­­place it nice­­ly

  • Re­gur­gi­­tate it

Lat­er, when eval­u­at­ing a for­mu­la, if some­one asks for cell _a1 give him cell a1.

And voilà, rel­a­tive cell­s.

This work­s, and it works well (ok, I had to in­tro­duce some ug­ly glob­al­s, I need to learn more stuff), and it is guar­an­teed to to­k­enize in the same way python does it. It's not even re­al­ly slow [5]

I touched a bunch of oth­er things, in­clud­ing sup­port for all the func­tions in python's math mod­ule so you can use them in cell­s. Here's the code to do that:

for name in dir(math):
        if name[0]<>"_":
                self.tools[name]=eval('math.'+name)

Freaky stuff, is­n't it?

What's the main is­sue? Per­for­mance. To put it sim­ply, I se­ri­ous­ly doubt a sheet writ­ten in python can be fast enough for gen­er­al use. But hey, it's ex­ten­si­ble, it's nice, and de­pend­ing on what you are try­ing to do, it may be good enough.

And here's to­day's ver­sion of Stupid­Sheet in­clud­ing rel­a­tive cell­s. Don't wor­ry, it's a small down­load ;-)

sergio / 2006-04-03 14:54:

A SpreedSheet in 500 lines of code... a P2P client write in 15 lines of code... Python is amazing. :)

Roberto Alsina / 2006-04-03 14:55:

Well, it really isn't a *good* spreadsheet ;-)


Contents © 2000-2024 Roberto Alsina