Skip to main content

Ralsina.Me — Roberto Alsina's website

Writing a simple parser using a finite state machine

I wrote a lit­er­ate pro­gram­ming tool called Cryc­co and at its core is a very sim­ple pars­er which ex­tracts com­ment blocks from source code and or­ga­nizes them in­to a doc­u­men­t.

If you want to see an ex­am­ple, Cryc­co's web­site is made out of its own code by pro­cess­ing it with it­self.

The for­mat is very sim­ple: you write a com­ment block, then some code, then an­oth­er com­ment block, and so on. The com­ment blocks are tak­en as mark­down, and the code blocks are tak­en as source code.

The doc­u­ment is then split in sec­tion­s: 1 chunk of doc, 1 chunk of code, and pro­cessed to make it look nice, with syn­tax high­light­ing and so on.

The pars­er was just a cou­ple dozen lines of code, but when I want­ed to add more fea­tures I ran in­to a wal­l, so I rewrote it us­ing a fi­nite state ma­chine (F­S­M) ap­proach.

Since, again, Cryc­co is lit­er­ate code, you can see the pars­er by just look­ing at the source code of the pars­er it­self, which is in the Doc­u­ment class

Of course, that may not be enough, so let's go in­to de­tail­s.

The state ma­chine has a few states:

    enum State
      CommentBlock
      EnclosingCommentBlock
      CodeBlock
    end

A com­ment block is some­thing like this:

    # This is a comment
    # More comment

An en­clos­ing com­ment block is a "mul­ti­line" com­men­t, like this:

 /* This is a comment
 More comment
 */

Code blocks are lines that are not com­ments :-)

So, suppose you are in the CommentBlock state, and you see a line that starts with # you stay in the same state.

If you see a line that does not start with #, you switch to the CodeBlock state.

When you are in the CommentBlock state, the line you are in is a comment. If you are in the CodeBlock state, the line is code.

Here are the pos­si­ble tran­si­tions in this ma­chine:

    state_machine State, initial: State::CodeBlock do
      event :comment, from: [State::CodeBlock], to: State::CommentBlock
      event :enclosing_comment_start, from: [State::CodeBlock], to: State::EnclosingCommentBlock
      event :enclosing_comment_end, from: [State::EnclosingCommentBlock], to: State::CodeBlock
      event :code, from: [State::CommentBlock], to: State::CodeBlock
    end

Then, to parse the doc­u­men­t, we go line by line, and call the ap­pro­pri­ate event de­pend­ing on the line we are read­ing. That event may change the state or not.

For ex­am­ple:

        if is_comment.match(line) && !NOT_COMMENT.match(line)
          self.comment {
            # These blocks only execute when transitions are successful.
            #
            # So, this block is executed when we are transitioning
            # to a comment block, which means we are starting
            # a new section
            @sections << Section.new(@language)
          }
          # Because the docs section is supposed to be markdown, we need
          # to remove the comment marker from the line.
          processed_line = processed_line.sub(@language.match, "") unless @literate

And that's it! We send the prop­er events to the ma­chine, the ma­chine changes state, we han­dle the line ac­cord­ing to what state we are in, and we end up with a nice­ly parsed doc­u­men­t.

Parsers are some­what scary, but they don't have to be. A fi­nite state ma­chine is a very sim­ple way to write a pars­er that is easy to un­der­stand and main­tain, and of­ten is enough.

Have fun pars­ing!

Literate version of grafito code (WIP)

In the past cou­ple of weeks I have start­ed (and pret­ty much fin­ished) a tool called Grafi­to and the end re­sult is un­der two thou­sand lines of code, in­clud­ing HTML and CSS.

For small­er code­bas­es, I think it makes sense to make them lit­er­ate. Just a cou­ple hours writ­ing around the code can make it per­fect­ly clear to un­der­stand for any­one start­ing with the code­base and (to be hon­est) al­so for the me of the fu­ture who will re­mem­ber noth­ing about it.

So I am pub­lish­ing the com­ment­ed code­base of grafi­to pro­cessed through Cryc­co a lit­er­ate pro­gram­ming tool I wrote. Yes, the web­site for Cryc­co is Cryc­co's source code. That's tra­di­tion­al :-)

The code is not yet ful­ly com­ment­ed and I have found a cou­ple bugs in Cryc­co al­ready:

  • Links in the side­bar are wrong in some cas­es
  • There is no way to pub­lish a lit­er­ate HTML file!

In any case, I ex­pect no­body cares, but I think it's nice and it's not a ton of ef­fort so that makes it worth do­ing, so I did it.

Creating a demo site for a service

Re­cent­ly I wrote an app called Grafi­to to view sys­temd/jour­nald logs (those are the logs in most Lin­ux sys­tem­s) and be able to fil­ter them, see de­tails of a spe­cif­ic en­try, etc.

One prob­lem with this kind of tool is that I can't just open it to the world be­cause then ev­ery­one would be able to see the logs in a re­al ma­chine. While that is usu­al­ly not a prob­lem be­cause the in­for­ma­tion is not ter­ri­bly use­ful (sure, you will know what's run­ning, big whoop­s), it may dis­play a dan­ger­ous piece of da­ta which I may not want to ex­pose.

So, the right way to do this is to cre­ate a de­mo site. It could be show­ing the re­al da­ta from a throw­away sys­tem (like a vir­tu­al ma­chine) or like I did show fake da­ta.

To show fake da­ta you can use a fak­er. Fak­ers are fun! I am us­ing askn/­fak­er which is a Crys­tal one. Fak­ers let you ask for, you guessed it... fake da­ta.

For ex­am­ple, you can ask for an ad­dress or a cred­it card num­ber and it would give you some­thing ran­dom that match­es the ob­vi­ous pat­terns of what you ask.

One I love is to ask for say_something_smart which gives you smart things!

Faker::Hacker.say_something_smart #=> 
"Try to compress the SQL interface, maybe it will program the 
back-end hard drive!"

So, I wrote a function that works like journalctl but is totally fake. The source code is just a quick hack.

Then, I used a con­di­tion­al com­pile flag to route the in­fo re­quests in­to that fake func­tion:

{% if flag?(:fake_journal) %}
  require "./fake_journal_data" # For fake data generation
{% end %}

{% if flag?(:fake_journal) %}
    Log.info { "Journalctl.known_service_units: Using FAKE service units." }
    fake_units = FakeJournalData::SAMPLE_UNIT_NAMES.compact.uniq.sort
    Log.debug { "Returning #{fake_units.size} fake service units." }
    return fake_units
{% else %}
    # Return actual good stuff
{% end %}

And that's it! If I compile with -Dfake_journal it builds a binary that is using only fake data. Then I had it run in my home server and voilá: a demo!

See it in ac­tion! grafi­to-de­mo.ralsi­na.me

Revisiting the RPU (Ralsina Programmatic Universe)

A while back I no­ticed I had start­ed many projects us­ing the Crys­tal lan­guage, be­cause it re­al­ly made me want to code more.

Of course those projects are not stan­dalone, many are li­braries or tools used by oth­er pro­ject­s, and some are forks of oth­er peo­ple's tools I made mi­nor changes to ("­fork­s") and some are web­sites, and so on.

Well I semi-au­to­mat­ed the gen­er­a­tion of a chart show­ing how things con­nec­t. First, this is the chart:

RPU Chart

And here is a hacky python script I used to gen­er­ate it via mer­maid (as­sumes you have all your re­pos cloned and will be use­ful for NO­BODY)

from glob import glob

print("graph LR")
sites = [
    "faaso.ralsina.me",
    "nicolino.ralsina.me",
    "nombres.ralsina.me",
    "ralsina.me",
    "tapas.ralsina.me",
]

for site in sites:
    print(f"  {site}>{site}]")

nicolino_sites = [
    "faaso.ralsina.me",
    "nicolino.ralsina.me",
]
faaso_sites = [
    "nombres.ralsina.me",
    "tapas.ralsina.me",
]
caddy_sites = [
    "faaso.ralsina.me",
    "nicolino.ralsina.me",
    "ralsina.me",
]

planned = [
    ("nicolino", "markd"),
    ("nicolino", "cr-wren"),
    ("crycco", "libctags.cr"),
    ("crycco", "crystal-ctags"),
]

hace_repos = [
    "nicolino",
    "tartrazine",
    "crycco",
    "markterm",
    "sixteen",
]

for repo in glob("*/"):
    repo = repo.strip("/")
    if repo == "forks":
        continue
    print(f"  {repo}(({repo}))")

for repo in glob("forks/*/"):
    repo = repo.split("/")[-2]
    print(f"  {repo}([{repo}])")

for s in nicolino_sites:
    print(f"  {s} ---> nicolino")
for s in faaso_sites:
    print(f"  {s} ---> faaso")
for s in caddy_sites:
    print(f"  {s} ---> caddy-static")


def ralsina_deps(shard):
    pass


for shard in glob("**/shard.yml"):
    repo = shard.split("/")[-2]
    for line in open(shard).readlines():
        if "ralsina/" in line or "markd" in line:
            dest = line.split(":")[-1].split("/")[-1]
            if not dest.strip():
                continue
            print(f"  {repo} ---> {dest}")

for a, b in planned:
    print(f"{a} -.-> {b}")


for repo in hace_repos:
    print(f"{repo} ---> hace")

Contents © 2000-2025 Roberto Alsina