Official RaSPF page
Ok, time to go a little more public with this.
Here's a page for it (click on "read more") and I will ask the openspf guys to put it on the implementations list (let's see how that goes).
Ok, time to go a little more public with this.
Here's a page for it (click on "read more") and I will ask the openspf guys to put it on the implementations list (let's see how that goes).
I have been able to work some more on RaSPF and the results are encouraging.
Thanks to valgrind and test suites, I am pretty confident it doesn't leak memory, or at least, that it doesn't leak except on very rare cases.
I think I found a neat way to simplify memory management, though, and that's what I wanted to mention.
This is probably trivial for everyone reading, but I am a limited C programmer, so whenever something works unexpectedly right, I am happy ;-)
One problem with C memory management is that if you have many exit points for your functions, releasing everything you allocate is rather annoying, since you may have to do it in several different locations.
I compounded this problem because I am using exceptions (yeah, C doesn't have them. I used this).
Now not only do I have my returns but also my throws and whatever uncaught throw something I called has!
Hell, right?
Nope: what exceptions complicated, exceptions fixed. Look at this function:
bstring spf_query_get_explanation(spf_query *q, bstring spec) { bstring txt=0; struct bstrList *l=0; bstring expanded=0; bstring result=0; struct tagbstring s=bsStatic(""); try { // Expand an explanation if (spec && spec->slen) { expanded=spf_query_expand(q,spec,1); l=spf_query_dns_txt(q,expanded); if (l) { txt=bjoin(l,&s); } else { txt=bfromcstr(""); } result=spf_query_expand(q,txt,0); throw(EXC_OK,0); } else { result=bfromcstr("explanation: Required option is missing"); throw(EXC_OK,0); } } except { if(expanded) bdestroy(expanded); if(txt) bdestroy(txt); if(l) bstrListDestroy(l); on (EXC_OK) { return result; } if(result) bdestroy(result); throw(EXCEPTION.type,EXCEPTION.param1); } }
It doesn't matter if spf_query_expand or spf_query_dns_txt throw an exception, this will not leak.
Nice, I think :-)
RaSPF, my C port of PySPF, is pretty much functional right now.
Here's what I mean:
It passes 75 internal unit tests (ok, 74 , but that one is arguable).
It passes 137 of 145 tests of the SPF official test suite.
It agrees with PySPF in 181 of the 183 cases of the libspf2 live DNS suite.
It segfaults in none of the 326 test cases.
So, while there are still some corner cases to debug, it's looking very good.
I even spent some time with valgrind to plug some leaks ( the internal test suite runs almost leakless, the real app is a sieve ;-)
All in all, if I can spend a little while with it during the week, I should be able to make a release that actually works.
Then, I can rewrite my SPF plugin for qmail, which was what sent me in this month-log tangent.
As a language wars comparison:
The sloccount of raspf is 2557 (or 2272 if we use the ragel grammar source instead of the generated file)
The sloccount of PySPF is 993.
So, a 2.6:1 or 2.28:1 code ratio.
However, I used 4 non-standard C libraries: bstrlib, udns, and helpers for hashes and exceptions, which add another 5794 LOCs.
So, it could be argued as a 8:1 ratio, too, but my C code is probably verbose in extreme, and many C lines are not really "logic" but declarations and such.
Also, I did not write PySPF, so his code may be more concise, but I tried my best to copy the flow as much as possible line-per-line.
In short, you need to write, according to this case, between 2 and 8 times more code than you do in Python.
That's a bit much!
In my previous post, I mentioned how PySPF does something using a regular expression which I couldn't easily reproduce in C.
So, I started looking at parser generators to use the original SPF RFC's grammar.
But that had its own problems.... and then came ragel.
Ragel is a finite state machine compiler, and you can use it to generate simple parsers and validators.
The syntax is very simple, the results are powerful, and here's the main chunk of code that lets you parse a SPF domain-spec (it works, too!):
machine domain_spec; name = ( alpha ( alpha | digit | '-' | '_' | '.' )* ); macro_letter = 's' | 'l' | 'o' | 'd' | 'i' | 'p' | 'h' | 'c' | 'r' | 't'; transformers = digit* 'r'?; delimiter = '.' | '-' | '+' | ',' | '|' | '_' | '='; macro_expand = ( '%{' macro_letter transformers delimiter* '}' ) | '%%' | '%_' | '%-'; toplabel = ( alnum* alpha alnum* ) | ( alnum{1,} '-' ( alnum | '-' )* alnum ); domain_end = ( '.' toplabel '.'? ) | macro_expand; macro_literal = 0x21 .. 0x24 | 0x26 .. 0x7E; macro_string = ( macro_expand | macro_literal )*; domain_spec := macro_string domain_end 0 @{ res = 1; };
And in fact, it's simpler than the ABNF grammar used in the RFC:
name = ALPHA *( ALPHA / DIGIT / "-" / "_" / "." ) macro-letter = "s" / "l" / "o" / "d" / "i" / "p" / "h" / "c" / "r" / "t" transformers = *DIGIT [ "r" ] delimiter = "." / "-" / "+" / "," / "/" / "_" / "=" macro-expand = ( "%{" macro-letter transformers *delimiter "}" ) / "%%" / "%_" / "%-" toplabel = ( *alphanum ALPHA *alphanum ) / ( 1*alphanum "-" *( alphanum / "-" ) alphanum ) domain-end = ( "." toplabel [ "." ] ) / macro-expand macro-literal = %x21-24 / %x26-7E macro-string = *( macro-expand / macro-literal ) domain-spec = macro-string domain-end
So, thumbs up for ragel!
Update:
The code looks very bad on python or agregators.
This piece of code alone fixed 20 test cases from the SPF suite, and now only 8 fail. Neat!
Working on my SPF library, I ran into a problem. I needed to validate a specific element, and the python code is a little hairy (it splits based on a large regexp, and it's tricky to convert to C).
So, I asked, and was told, maybe you should start from the RFC's grammar.
Ok. I am not much into grammars and parsers, but what the heck. So I check it. It's a ABNF grammar.
So, I look for the obvious thing: a ABNF parser generator.
There are very few of those, and none of them seems very solid, which is scary, because almost all the RFC's define everything in terms of ABNF (except for some that do worse, and define in prose. Did you know there is no formal, verifiable definition of what an Ipv6 address looks like?).
So, after hours of googling...
Anyone knows a good ABNF parser generator? I am trying with abnf2c but it's not strict enough (I am getting a parser that doesn't work).
Anyone knows why those very important documents that rule how most of us make a living/work/have fun are so ... hazy?