Building a large text corpus (Part I)

December 12, 2017

Getting good text data for language-model training isn’t as easy as it sounds. First, you have to find a large corpus. Second, you must clean it up!

Read the rest of this entry »

Finding dependencies for Make

April 25, 2017

How hard is it to get dependencies for your project to use in a Makefile?

Well, it depends.

Read the rest of this entry »

r4nd0m pa$$w0rd

March 14, 2017

Let’s take it easy this week. What about we generate random passwords? That should be fun, right?


Read the rest of this entry »

Parsing GPS data with Bash

May 7, 2013

Last time we looked at how to get the data to the GPS and now we will have a look at how to parse the data. Turns out that except for the check-sum, everything is pretty straight forward, even in Bash.


So, why bash in the first place? Well, there’s not real reason except that for the something else I’m working on, it’s the ideal glue-code language, allowing me to invoke simply other programs that I do not want to re-code (or take parts of) to do what I want. I must say that I even have a C# version of the GPS data grabber, but while fancier, it does not bring much more than the Bash version.

Read the rest of this entry »

Log Watching

November 3, 2009

Very often, you have to keep an eye on a log, or maybe more than one log, and a couple of other things while a long-term simulation is running. The GNU/Linux distributions offer the program watch that allows the periodical execution of a command in the current interactive shell. While watch is convenient, you still have the problem of displaying the needed information in a terminal geometry aware way. Turns out, there are tools to query the terminal geometry and we can use them to write simple, effective, well displayed scripts.


So let us see how we can make BASH somewhat aware of the terminal it runs in.

Read the rest of this entry »