Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

A graph is a graph is a graph.

Af­ter hack­ing for about two hours the cell de­pen­den­cies yes­ter­day us­ing dict­s, I found my­self say­ing "how can I check if the de­pen­den­cy graph is cycli­cal?"

And of course it hit me. That's the de­pen­den­cy graph. Use a graph li­brary!

I looked for about 2 min­utes be­fore find­ing one that's prob­a­bly not op­ti­mal, but is damn sim­ple and in the pub­lic do­main: graph_lib.py by Nathan Den­ny.

First in­ter­est­ing da­ta point, the first two lines on the file (yes, I found out there is a lat­er ver­sion):

#--Version 1.0.0
#--Nathan Denny, May 27, 1999

Yup. In two days this piece of code is turn­ing 8 years old and un­touched. But it works just fine and dandy!

Here's the piece of code that makes the en­gine run which has grown from a hum­ble 10 LOC to al­most a whoop­ing 40 LOC:

class SpreadSheet(QtCore.QObject):
    _cells = {}
    tools = {}
    graph=Graph()

    def __init__(self,parent):
        QtCore.QObject.__init__(self,parent)
        for name in dir(functions):
                self.tools[name]=eval('functions.'+name)


    def __setitem__(self, key, formula):
        key=key.lower()
        c=traxcompile('%s=%s;'%(key,formula))
        self._cells[key] = [c[key][0],False,compile(c[key][0],"Formula for %s"%key,'eval')]

        # Dependency graph
        if not self.graph.has_node(key):
                self.graph.add_node(key)
        for edge in self.graph.in_arcs(key):
                self.graph.delete_edge(edge)
        for cell in c[key][1]:
                self.graph.add_edge(cell,key)
        try:
                print 'GRAPH(TOPO): ',self.graph.topological_sort()
                self._cells[key][1]=False
                print 'GRAPH(BFS) : ',self.graph.bfs(key)
                for cell in self.graph.bfs(key)[1:]:
                        self.emit(QtCore.SIGNAL('changed'),(cell))
        except Graph_topological_error:
                # We made the graph cyclic
                # So, mark this cell as evil
                self._cells[key][1]=True
                # And remove all incoming edges to go back to
                # status quo
                for edge in self.graph.in_arcs(key):
                        self.graph.delete_edge(edge)

    def getformula(self, key):
        key=key.lower()
        return self._cells[key][0]
    def __getitem__(self, key ):
        print self._cells[key]
        if self._cells[key][1]:
                return "ERROR: cyclic dependency"
        else:
                print 'evaluating [%s]: '%key,type(self._cells[key][0]),self._cells[key][0]
              return eval(self._cells[key][0], self.tools, self)

This en­gine sup­port­s:

  • Us­er de­fined func­­tions (just add them in for­­mu­la.py)

  • Cell de­pen­­den­­cies with de­tec­­tion of ci­­cles, of any length

  • Un­lim­it­ed size spread­­sheets

  • No­ti­­fi­­ca­­tion of cell changes to oth­­er mod­­ules

  • Au­­to­­mat­ic re­­cal­cu­la­­tion

  • A cus­­tom for­­mu­la lan­guage (ok, that's not in this piece of code ;-)

The whole project is now 1167 LOC, of which 591 are gen­er­at­ed or graph_lib, which means ev­ery­thing in­clud­ing the trax­ter "com­pil­er" and the UI is less than 600 lines of code.

Not too shab­by.

My goal is a func­tion­al spread­sheet with a work­ing GUI and sup­port­ing the most com­mon func­tions and pieces of the for­mu­la lan­guage in .... 2000 LOC, and I am will­ing to work on it for about two week­s.

Let's see how it turns out. Of course any­one with 5 free min­utes can con­trib­ute his favourite spread­sheet func­tion (I al­ready have sum ;-)

sebastien / 2007-05-27 09:56:

Hi Roberto,

Do you know this small library http://cheeseshop.python.or... to implement lazy evaluation and track dependencies ?

It could be an interesting tool for stupidsheet.

best,

Seb

Roberto Alsina / 2007-05-27 12:45:

No, I had not found that one :-)

It has the same problem as the original recipe in that lazyness is not really something you want in a spreadsheet, but if it can be told to trigger calculations, then maybe it's useful.

I'll have to take a look.

Matthew Marshall / 2007-05-30 17:02:

Actually, lazyness is something you might want. For example, if A depends on B and a script changes B multiple times before A needs to be read, you should only have to calculate A once. This starts to be a big deal if you have a very deep dependency graph.

And as far as forcing recalculation, that's no problem. Check out the 'observers' module in Cellulose for an example of how it can be done. Whenever an observer cell receives notification that a dependency has possibly changed, it adds itself to a set which can later be 'flushed'.

You spreadsheet project sounds great!

MWM

Matthew Marshall / 2007-05-30 17:08:

BTW, Cellulose also solves the cyclic dependency problem. The cells that are currently being calculated are stored in a stack. By default an assert is made to make sure that a cell is not already being calculated.

MWM

Roberto Alsina / 2007-05-30 17:11:

MWM: thanks for the tips, I will chek it out.

However, that's not the real problem with cyclic dependencies in a spreadsheet.

What that approach does is avoid infinite loops. The real problem is that all the cells in the cycle are nonsensical or undefined, so you shouldn't ever let the user enter a formula that creates a loop.

Matthew Marshall / 2007-05-30 18:02:

Isn't that the same thing? When the user enters a formula you try evaluating it. If evaluating it fails due to cyclic dependencies you give feedback similar to how you would for a syntax error or something else.

Or, are you talking about a formula that is sometimes cyclic and sometimes not? (Due to branching.) Using raw cellulose cells wouldn't work for that.

(I haven't actually read you post yet, as I don't have much time online. I'll read it and be back in a day or two.)

MWM

Roberto Alsina / 2007-05-30 18:25:

You are right, if I try to evaluate a cell's formula, and reject the formula when it fails, the effect would be the same as my current approach.

The only difference would be that I detect it on parsing, and cellulose needs to do an evaluation, but that's not important.


Contents © 2000-2024 Roberto Alsina