Getting good text data for language-model training isn’t as easy as it sounds. First, you have to find a large corpus. Second, you must clean it up!
Let’s take it easy this week. What about we generate random passwords? That should be fun, right?
Last time we looked at how to get the data to the GPS and now we will have a look at how to parse the data. Turns out that except for the check-sum, everything is pretty straight forward, even in Bash.
So, why bash in the first place? Well, there’s not real reason except that for the something else I’m working on, it’s the ideal glue-code language, allowing me to invoke simply other programs that I do not want to re-code (or take parts of) to do what I want. I must say that I even have a C# version of the GPS data grabber, but while fancier, it does not bring much more than the Bash version.