Monday, August 18, 2008

Diacritics starvation

A big part of our development on Teamness is centered around internationalization. This includes many factors and one of the most important is translation. At the moment, we only support Romanian besides English, so each word that appears on the pages or in the emails sent from Teamness must be translated in Romanian.

A particular aspect of Romanian alphabet, which is also common for a lot of languages around the world, is that there are some special letters, aside from the ones present in the ASCII encoding. These letters are: ă, î, â, ş and ţ.

Sadly, I noticed that a lot of Romanian websites lack the diacritics. In my opinion, not using the special Romanian letters in writing is acceptable for some personal blog posts, comments on other blogs or posts on forums, due to higher speed of typing. But it's completely unacceptable for online businesses. Besides the fact that it's harder to read and follow the text, sometimes the meaning is changed, like in this examples: peste (over, beyond) vs peşte(fish) or vina (guilt) vs vîna (to hunt).

I think most of the Romanian speakers use an English input language on their keyboards. This makes it easier to type Romanian words that contain the special letters mentioned above using their counterparts in English: s for ş, a for ă, i for î or â and t for ţ. I also find it easier to write the text without diacritics in the first place, but if that text needs to appear on a presentation page or in a document or sometimes even in a comment, I make sure I put the special characters in place.

Recently I found out about, via, which makes this task even easier. You give it a Romanian text without special letters and it gives the text back featuring diacritics.

I won't list here any Romanian website that doesn't write properly in Romanian (cause in the end this is what it is), but I have to mention ComunicateMedia's website. We recently sent them a press release about the Teamness website in Romanian. The press release was written with diacritics, but they were stripped off. It's one thing not to be willing to use them, but at least don't modify the text submitted by your users.

