Learn Chinese writing with our StatusNet account

Ryan Weal

February 27, 2012

Chinese language bot for statusnet

Have you ever wanted to learn Chinese? I took a course in university and I did not do all that well! Truth be told, I spent most of my time in the course researching linguistics and I still find the subject fascinating today. I really enjoy learning new languages! Recently I was thinking about how the first 1500-3000 characters in Chinese are all you really need to get by. This number is much lower than in English and French so many published lists of all the "basic words" are out there. This weekend I searched for such a list and the author made the contents available for re-use. Excellent. Shortly thereafter I realized I could parse that file and make it into a flash card program... using StatusNet. The cron-bot I built posts a random character from the list of ~2700 or so characters in this list every 10 minutes. Why so often? So you always see new stuff coming in. Also, because you can go through the *entire* list in less than 20 days at this rate. Check it out: http://status.kafei.ca/dawei If you have an account on identi.ca or another StatusNet service you can subscribe to this URL. RSS is also available. You could also just visit this page every now and then. Note, there is a "play" button and a pop-up button beside it. Those can be handy for watching live updates if you want to enable it on your office desktop. ^_^ Eventually I might mirror it to twitter. Let me know what you think on my contact form at http://kafei.ca/contact

How it was done

I found a listing of the 3000 basic characters (well, more like 2700+) at this website: http://www.zein.se/patrick/3000char.html and they even mention that re-use of the materials for flash cards are ok! Excellent. I took this file and pasted it into an OpenOffice Calc spreadsheet and cropped everything down to just the rows and columns I needed. I then exported the file to CSV format, and then proceeded to perform a few regex operations on it to get the data structure just right. I actually did this twice - once after launch to fix a bug. When the CSV file was successfully mutated into a file compatible with the command line tool "fortune" I was ready to go. I compiled fortune's "dat" file and then created a cron job to post random selections to my StatusNet site. Now a cron job will run every 10 minutes, ask fortune for a random word from the dictionary of the first 3000ish words, and post it to StatusNet.

Why do it this way?

Every day you read your StatusNet feeds - sometimes many, many, many times over the course of the day. It makes sense to put repetitive information like this into a feed that you are going to read over and over. It also makes sense on the level of... knowing when you learned something. That is, in what order you learned it. So scrolling back you hit that point that you've already read. It also isn't particularly important information. If you miss it... so what. It will eventually come around again. The cycle sound run in roughly 20 days. It is possible for repeats. It is totally random... as much as the fortune program can be random. It is also possible to comment on the postings to practice using the text. That could be pretty cool, especially if working with a translator to make corrections as you go. http://status.kafei.ca/dawei

Written by:
Ryan Weal @ryan_weal
Web developer based in Montréal. I run Kafei Interactive Inc. Node.js, Vue.js, Cassandra. Distributed data. Hire us to help with your data-driven projects.