Skip to main content

Ralsina.Me — Roberto Alsina's website

Posts about sysadmin

With iterpipes, python is ready to replace bash for scripting. Really.

This has been a pet peeve of mine for years: pro­gram­ming shell scripts suck. They are ug­ly and er­ror prone. The on­ly rea­son why we still do it? There is no re­al re­place­men­t.

Or at least that was the case, un­til to­day I met iter­pipes at python.red­dit.­com

Iter­pipes is "A li­brary for run­ning shell pipe­lines us­ing shel­l-­like syn­tax" and guess what? It's bril­liant.

Here's an ex­am­ple from its PYPI page:

# Total lines in *.py files under /path/to/dir,
# use safe shell parameters formatting:

>>> total = cmd(
...     'find {} -name {} -print0 | xargs -0 wc -l | tail -1 | awk {}',
...     '/path/to/dir', '\*.py', '{print $1}')
>>> run(total | strip() | join | int)
315

Here's how that would look in shel­l:

find /path/to/dir -name '*.py' -print0 | xargs -0 wc -l | tail -1 | awk '{print $1}'

You may say the shell ver­sion looks bet­ter. That's an il­lu­sion caused by the evil that is shell script­ing: the shell ver­sion is bug­gy.

Why is it buggy? Because if I control what's inside /path/to/dir I can make that neat little shell command fail [1], but at least in python I can handle errors!

Al­so, in most ver­sions you could at­tempt to write, this com­mand would be un­safe be­cause quot­ing and es­cap­ing in shell is in­sane!

The iter­pipes ver­sion us­es the equiv­a­lent of SQL pre­pared state­ments which are much safer.

It's near­ly im­pos­si­ble to do such a com­mand in pure shell and be sure it's safe.

Al­so, the shell ver­sion pro­duces a string in­stead of an in­te­ger, which sucks if you in­tend to do any­thing with it.

And the most im­por­tant ben­e­fit is, of course, not when you try to make python act like a shel­l, but when you can stop pre­tend­ing shell is a re­al pro­gram­ming lan­guage.

Consider this gem from Arch Linux's /etc/rc.shutdown script. Here, DAEMONS is a list of things that started on boot, and this script is trying to shut them down in reverse order, unless the daemon name starts with "!":

# Shutdown daemons in reverse order
let i=${#DAEMONS[@]}-1
while [ $i -ge 0 ]; do
        if [ "${DAEMONS[$i]:0:1}" != '!' ]; then
                ck_daemon ${DAEMONS[$i]#@} || stop_daemon ${DAEMONS[$i]#@}
        fi
        let i=i-1
done

Nice uh?

Now, how would that look in python (I may have in­vert­ed the mean­ing of ck­_­dae­mon)?

# Shutdown daemons in reverse order
for daemon in reversed(DAEMONS):
    if daemon[0]=='!':
        continue
    if ck_daemon(daemon):
        stop_daemon(daemon)

Where stop_­dae­mon used to be this:

stop_daemon() {
    /etc/rc.d/$1 stop
}

And will now be this:

def stop_daemon(daemon):
    run(cmd('/etc/rc.d/{} stop',daemon))

So, come on, peo­ple, we are in the 21st cen­tu­ry, and shell script­ing sucked in the 20th al­ready.

Migrating from Haloscan to Disqus (if you can comment on it, it worked ;-)

Introduction

If you are a Haloscan user, and are start­ing to won­der what can you do... this page will ex­plain you a way to take your com­ments to Dis­qus, an­oth­er free com­ment ser­vice.

A few days ago, Haloscan an­nounced they were stop­ping their free com­ment ser­vice for blogs. Guess what ser­vice has in it the com­ments of the last 9 years of this blog? Yes, Haloscan.

They of­fered a sim­ple mi­gra­tion to their Echo plat­for­m, which you have to pay for. While Echo looks like a per­fect­ly nice com­ment plat­for­m, I am not go­ing to spend any mon­ey on this blog if I can help it, since it al­ready eats a lot of my time.

Luck­i­ly, the guys at Haloscan al­low ex­port­ing the com­ments (that used to be on­ly for their pre­mi­um ac­counts), so thanks Haloscan, it has been nice!

So, I start­ed re­search­ing where I could run to. There seems to be two large free com­ment sys­tem­s:

Keep in mind that my main in­ter­est lays in not los­ing al­most ten years of com­ments, not on how great the ser­vice is. That be­ing said, they both seem to of­fer rough­ly the same fea­tures.

Let's con­sid­er how you can im­port com­ments to each ser­vice:

  • Dis­­qus: It can im­­port from blog­ger and some oth­­er host­ed blog ser­vice. Not from Haloscan.

  • In­­tense De­­bate: Can im­­port from some host­ed ser­vices, and from some files. Not from the file Haloscan gave me.

So, what is a guy to do? Write a python pro­gram, of course! Here's where Dis­qus won: they have a pub­lic API for post­ing com­ments.

So, all I have to do then is:

  1. Grok the Dis­­qus API

  2. Grok the Haloscan com­­ments file (it's XM­L)

  3. Cre­ate the nec­es­sary threads and what­ev­er in Dis­­qus

  4. Post the com­­ments from Haloscan to Dis­­qus

  5. Hack the blog so the links to Haloscan now work for Dis­­qus

Piece of cake. It on­ly took me half a day, which at my cur­rent rates is what 3 years of Echo would have cost­ed me, but where's the fun in pay­ing?

So, let's go step by step.

1. Grok the Disqus API

Luck­i­ly, there is a rea­son­able Dis­qus Python Client li­brary and docs for the API so, this was not hard.

Just get the li­brary and in­stall it:

hg clone https://IanLewis@bitbucket.org/IanLewis/disqus-python-client/
cd disqus-python-client
python setup.py install

The API us­age we need is re­al­ly sim­ple, so study the API docs for 15 min­utes if you wan­t. I got al­most all the tips I need­ed from this py­blox­som im­port script

Ba­si­cal­ly:

  1. Get your API Key

  2. You lo­­gin

  3. You get the right "fo­rum" (y­ou can use a dis­­qus ac­­count for more than one blog)

  4. Post to the right thread

2. Grok the Haloscan comments file

Not on­ly is it XM­L, it's pret­ty sim­ple XM­L!

Here's a taste:

<?xml version="1.0" encoding="iso-8859-1" ?>
<comments>
    <thread id="BB546">
      <comment>
        <datetime>2007-04-07T10:21:54-05:00</datetime>
        <name>superstoned</name>
        <email>josje@aaaaaa.nl</email>
        <uri></uri>
        <ip>86.92.111.236</ip>
        <text><![CDATA[that is one hell of a cool website ;-)]]></text>
      </comment>
      <comment>
        <datetime>2007-04-07T16:14:53-05:00</datetime>
        <name>Remi Villatel</name>
        <email>maxilys@aaaaaa.fr</email>
        <uri></uri>
        <ip>77.216.206.65</ip>
        <text><![CDATA[Thank you for these rare minutes of sweetness in this rough world...]]></text>
      </comment>
    </thread>
</comments>

So, a com­ments tag that con­tains one or more thread tags, which con­tain one or more com­ment tags. Piece of cake to tra­verse us­ing El­e­ment­Tree!

There is an ob­vi­ous match be­tween com­ments and threads in Haloscan and Dis­qus. Good.

3. Create the necessary threads and whatever in Disqus

This is the tricky part, re­al­ly, be­cause it re­quires some things from your blog.

  • You must have a per­ma­link for each post

  • Each per­ma­link should be a sep­a­rate page. You can't have per­ma­links with # in the URL

  • You need to know what haloscan id you used for each post's com­­ments, and what the per­ma­link for each post is.

For ex­am­ple, sup­pose you have a post at //ralsi­na.me/we­blog/­post­s/AD­V0.html and it has a Haloscan com­ments link like this:

<a hre­f="javascrip­t:HaloScan('AD­V0');" tar­get="_­self"> <script type­="­tex­t/­javascrip­t">­post­Coun­t('AD­V0');</scrip­t></a>

You know where else that 'AD­V0' ap­pears? In Haloscan's XML file, of course! It's the "id" at­tribute of a thread.

Al­so, the ti­tle of this post is "Ad­voga­to post for 2000-01-17 17:19:57" (hey, it's my blog ;-)

Got that?

Then we want to cre­ate a thread in Dis­qus with that ex­act same da­ta:

  • URL

  • Thread ID

  • Ti­­tle

The bad news is... you need to gath­er this in­for­ma­tion for your en­tire blog and store it some­where. If you are luck­y, you may be able to get it from a database, as I did. If not... well, it's go­ing to be a lot of work :-(

For the pur­pose of this ex­pla­na­tion, I will as­sume you got that da­ta nice­ly in a dic­tio­nary in­dexed by thread id:

{
  id1: (url, title),
  id2: (url, title)
}

4. Post the comments from Haloscan to Disqus

Here's the code. It's not re­al­ly test­ed, be­cause I had to do sev­er­al at­tempts and fix­es, but it should be close to ok (down­load).

#!/usr/bin/python
# -*- coding: utf-8 -*-

# Read all comments from a CAIF file, the XML haloscan exports

from disqus import DisqusService
from xml.etree import ElementTree
from datetime import datetime
import time


# Obviously these should be YOUR comment threads ;-)
threads={
    'ADV0': ('//ralsina.me/weblog/posts/ADV0.html','My first post'),
    'ADV1': ('//ralsina.me/weblog/posts/ADV1.html','My second post'),
    }

key='USE YOUR API KEY HERE'
ds=DisqusService()
ds.login(key)
forum=ds.get_forum_list()[0]

def importThread(node):
    t_id=node.attrib['id']

    # Your haloscan thread data
    thr_data=threads[t_id]

    # A Disqus thread: it will be created if needed
    thread=ds.thread_by_identifier(forum,t_id,t_id)['thread']

    # Set the disqus thread data to match your blog
    ds.update_thread(forum, thread, url=thr_data[0], title=thr_data[1])


    # Now post all the comments in this thread
    for node in node.findall('comment'):
        dt=datetime.strptime(node.find('datetime').text[:19],'%Y-%m-%dT%H:%M:%S')
        name=node.find('name').text or 'Anonymous'
        email=node.find('email').text or ''
        uri=node.find('uri').text or ''
        text=node.find('text').text or 'No text'

        print '-'*80
        print 'Name:', name
        print 'Email:', email
        print 'Date:', dt
        print 'URL:', uri
        print
        print 'Text:'
        print text

        print ds.create_post(forum, thread, text, name, email,
                                   created_at=dt, author_url=uri)
        time.sleep(1)

def importComments(fname):
    tree=ElementTree.parse(fname)
    for node in tree.findall('thread'):
        importThread(node)


# Replace comments.xml with the file you downloaded from Haloscan
importComments('comments.xml')

Now, if we are luck­y, you al­ready have a nice and ful­ly func­tion­ing col­lec­tion of com­ments in your Dis­qus ac­coun­t, and you should be calm know­ing you have not lost your da­ta. Ready for the fi­nal step?

Why I STILL use Arch Linux

Yes­ter­day I had one of those mo­ments where I feel very hap­py about my dis­tro of choice, Arch Lin­ux. Since the last time I post­ed about Arch seems to have been over two years ago (time flies when you are hav­ing fun!), I think it's time to ex­plain it.

I want­ed to test rst2pdf against re­port­lab from SVN, wor­daxe from SVN and do­cu­tils from SVN, and I want­ed it to be sim­ple.

So­lu­tion: I just pack­aged them in AUR!

Now, whenever I need to check rst2pdf agains wordaxe trunk, I just need to yaourt -S python-­wor­dax­e-svn and I can go back to stable wordaxe with yaourt -S python-­wor­daxe.

The svn pack­age will al­ways be the cur­rent trunk with­out any mod­i­fi­ca­tion­s, and I can switch back and forth in about 45 sec­ond­s, with­out mess­ing up my sys­tem's pack­ages.

Also, I can keep my installed SVN packages updated by doing yaourt -Su --de­v­el every now and then.

How would I have done that us­ing De­bian or a RPM dis­tro? I sup­pose by go­ing around the pack­ag­ing sys­tem (which I hate) or by do­ing a pri­vate re­po (which is so ... lame?) or by do­ing a pub­lic re­po (which is freak­ing work).

Re­al­ly, if you are a coder, I can't think of a Lin­ux dis­tro that makes life eas­i­er than Arch. Pret­ty much ev­ery­thing is there (12K pack­ages in un­sup­port­ed!) and if it is­n't, it's a 5-minute job to slap it in­to AUR and help the com­mu­ni­ty.

Sup­pose you are do­ing a KDE ap­p. On most dis­tros you need to in­stall your own from-­source copy of kdelibs to have the lat­est and make sure it's not screwed by dis­tro-spe­cif­ic patch­es.

On Arch? Patch­ing up­stream is frowned up­on. Not hav­ing the lat­est ver­sion is frowned up­on. So it's pret­ty much the ide­al en­vi­ron­ment to de­vel­op against KDE, or GNOME, or PyQt or what­ev­er.

If my life was not 150% com­mit­ted al­ready, I would try to be­come an Arch de­vel­op­er, or at least a TU (Trust­ed User). Maybe next life!

Outlook, IMAP and Exchange

Ok, I will post this just in case some poor soul needs to try do­ing it. You tell me if it makes any sense.

  • When in­­stalling Out­­look 2000, you can choose sup­­port for "Group­ware on­­ly", "In­ter­net on­­ly" or "both­­".

  • In­­ter­net On­­ly lets you cre­ate IMAP or POP3 ac­­counts, but no Ex­change ac­­counts.

  • Group­ware On­­ly lets you use Ex­change, but no IMAP or POP3.

  • Both gives you on­­ly Ex­change and POP3, but no IMAP.

Here is how you can have Ex­change and IMAP at the same time:

  1. Close Out­­look.

  2. Go to con­trol pan­el.

  3. There you have a "Mail" icon that was in­­stalled by Of­­fice.

  4. Use that icon to cre­ate the IMAP ac­­coun­t.

And some peo­ple dare say win­dows is easy to man­age :-P

Yes, I know, old soft­ware, what­ev­er. It's not my fault if up­grad­ing the freak­ing mail client costs so much mon­ey they won't do it be­caue they have to up­grade the whole pud­dle­jump­ing of­fice suit­e.

PS: Thanks to the guy that told me how it's done ;-)

PS2: How to make Out­look show more than 100 re­sults from LDAP: http://­sup­port­.mi­cro­soft­.­com/k­b/262848

A hard-to-block spammer: I need help.

Many of my clients have been spammed by La Cap­i­tana Re­al Es­tate late­ly. And I mean many. Hun­dred­s.

How­ev­er, they seem to have found a way to spam that work­s. And that suck­s.

They have cre­at­ed a Google Group, added all their vic­tims there, and let google do the dirty work.

What's the prob­lem?

  1. Google group mails are not block­­able at SMT­P-lev­el be­­cause their senders con­­tain a sort of hash and the re­­cip­i­ent ad­­dress, and no group name. That's in­­­cred­i­bly stupid in google's part.

  2. The mes­sages they send are huge (6MB and up) so spa­­mas­sas­sin can not process them. The SA docs say this will not hap­pen be­­cause of "the eco­nom­ics of spam". Well, it hap­pens when you make google do it!

  3. I don't want to go back to the old days of keep­­ing a lo­­cal queue-lev­el ad­­dress black­­list. That's aw­­ful!

I have com­plained to google, I have com­plained to the spam­mer­s, even by phone. They use the stan­dard de­fense of "we are just invit­ing peo­ple". "They can un­sub­scribe if they want to". "This is not spam"

Noone does any­thing.

What's the next step? I can't black­list google group­s!


Contents © 2000-2024 Roberto Alsina