? ?
Theppitak Karoonboonyanan
 

One major change in GNOME 3.6 is Pango’s shaper engines replacement with HarfBuzz. Only language engines (for word break analysis, for example) are retained. So, I’m checking how this affects Thai/Lao rendering and what to do next.

Over all, Behdad has put a good effort to make it right. Most Uniscribe behaviors have been achieved for compatibility. He even cares enough to cover some widespread Thai fonts in which the language tag 'latn' is used instead of 'thai', as seen in Mozilla #719366. Unfortunately, this font set has been declared as standard fonts in official documents. The workaround seems inevitable.

Supported Fonts

In my experiments with some existing Thai OpenType fonts, the new Pango still renders well without regression.

Loma font from fonts-tlwg (glyph positioning with GPOS, rearrangement with GSUB):

Arundina Sans font from Fonts-SIPA-Arundina (positioning by substitution, only GSUB, no GPOS):

But for legacy fonts without OpenType features, it renders badly:

In addition, according to Behdad, PUA glyphs in legacy fonts are not supported yet. This means there will be regression on fonts designed for Windows XP or below. But modern fonts designed for Windows 7 should be fine.

Changes on Bugs

The engine replacement from scratch certainly affects existing bugs. Some become obsolete, some still remain. Here are the summary for Thai/Lao engine, as resolved upstream:

Closed bugs:

  • GNOME #616495 (Debian #620001) regarding Lao MAI KONG rendering, which was caused by a flaw in my code. I have proposed a patch for a while, but no action is taken yet. And patched debs have been locally distributed as a workaround. However, with HarfBuzz replacement, the bug has now gone.
  • GNOME #378001 regarding minority languages supports. I hadn’t worked on this because I was waiting for WTT 3.0, a local standard, to be drafted. Anyway, with HarfBuzz replacement, the old WTT 2.0 clustering has been dumped and replaced with Unicode guidelines. Therefore, I assume it should be now possible to render minority languages with Thai script, provided that the font has the required OpenType features.
  • GNOME #393307, #677090 regarding wrong rendering of zero-width marks like ZWJ, ZWSP. This bug has also been dumped with the HarfBuzz replacement.

Questionable bugs:

  • GNOME #583718 (Debian #620002) regarding the rendition of Thai SARA AM (U+0E33) on VTE with excessive dotted circle. So far, I have disagreed with Behdad whether this bug should be treated along with Indic scripts. IMO, there is an easy path for Thai by rendering monospace fonts differently, which is also in accordance with widespread practice everywhere else, albeit XTerm, Emacs, or framebuffer TTYs. But Behdad doesn’t like the idea and insists that it should be treated along with Indic scripts, which would complicate things a lot. So, the bug has been there for many years. Meanwhile, I have also provided a workaround in the aforementioned patched debs.

    BTW, the situation has been changed a little bit after the HarfBuzz replacement. Firstly, let’s see the problem with current Pango:

    One can easily spot the dotted circle glitches. And here is how I workaround it, which is like how it's rendered on other terminals:

    With the HarfBuzz replacement, here is how it renders:

    That is, although it’s still wrong, it’s more tolerable. So, the question for users is: Could they tolerate this until VTE is redesigned for Indic scripts supports?

Remaining bugs:

  • GNOME #576156 (Debian #620004) regarding weird cursor movements caused by Unicode UAX #29. Many amendment efforts have been pushed to Unicode from different sources, until it was finally accepted in Unicode 6.1.0. However, no action has been taken in Pango yet. We still have to push it further. Again, a fix has also been provided in the patched debs.
2nd-Oct-2011 03:30 pm - Myanmar Visit

Quite a belated English blog (after the Thai version), due to busy personal life lately.

I had visited Yangon during 4-11 Sep. to give some talks and tutorials on Debian packaging and mirroring. And I've shared some information with community.

The visit was initiated by Ngwe Tun and the Myanmar L10N team. I found later that a Facebook event had been created for this.

Localization

The first day was a comparison between Myanmar and Thai supports in GNU/Linux, in which I briefed the status on Thai side, and Thura Hlaing on Myanmar side. It was nice that we had the Myanmar Computer Federation (MCF) director presiding the meeting til the end. That means GNU/Linux support has been awared at executive level.

According to Thura, Burmese has gained support in GNU/Linux quite well. On the rendering side, all the reordering for the logical order is normally done with pure GSUB in the fonts, without special processing on the rendering engine. This is suboptimal in principle, but it's the most effective way, as Windows redering engine itself does not yet support Myanmar, either.

For input method, Myanmar XKB map has been available in xkb-data for a long time, but to serve users' familarity with visual order typing, some reordering input methods have been developed, based on keymagic and ibus. But all are not context-sensitive like what's done for Thai in other frameworks. Fortunately, with the surrounding text API recently added to ibus, this has become possible.

One unusual requirement for Myanmar script editing is the caret movements. It needs to move syllable-wise, not character-wise nor cluster-wise. So, I suggested them to have a look on UAX #29 to see how it should be amended.

Myanmar locales are already done, both for GNU C library and CLDR. And even a GNOME applet for Myanmar lunar calendar is also available. This latter thing is what Thai can learn from.

Burmese word segmentation is not supported in general. But R&D works have been done for this in its NLP lab.

A serious issue left to solve is the existing abuses of Unicode. In Myanmar, there exist at least 14 variations of font hacks, abusing some free slots in Unicode charts as pre-composed clusters for information interchange (not for font internals), making plain text interchange impossible without the proper font for rendering.

For program translation, the new Myanmar L10N team is trying to request for a mass submission to the current GNOME team. And for Debian, Thura Hlaing and Ngwe Tun has already started the translation process with Christian's help.

Along my stay, I could see the team actively discussing on the IT glossary, trying to settle down the translated terms. This looked very fun.

Debian

Then, the next three days were a workshop on Debian packaging, where I have presented the basics of Debian package building, uploading, quality-controlling, modifying, creating and delivering. This aimed toward the development of a local distribution based on Debian.

Each day in the afternoon was the time for setting up a Debian mirror, not only for convenient local distro developement, but also for general users. This is important because internet penetration is still low in Myanmar. The main media for software distribution is CD/DVD, which means only stable version of Debian can be spread, which is not good for desktop users. Having a mirror should improve the situation somehow. It should make dist-upgrading to testing/unstable easier. And it should make CD snapshotting using local distributions easier, too.

For this, I also presented another quick slide on Debian mirroring & caching.

In the last day, I was introduced to the staffs of Myanmar NLP Lab and their projects, which include Myanmar OCR (based on tesseract), information retrieval, machine translation, and other lingustic resources like dictionary, lexicon and text corpus.

Furthermore, I was also offerred technical helps on developing a Lao/Esaan Tham font for a Lao and North Eastern Thailand variation of Tham script, which is Mon-based and is closely related to Myanmar script. (See some sample transliterations if you are curious how it looks like. It was part of my hacking during DebConf11 travelling.) Currently, its OpenType support is quite sufficient, but it still renders poorly on Mac OSX. To cope with this, I was given a Mac Mini as a present from Myanmar for its development, as well as some explanations on AAT features from a Myanmar font developer. And I am very grateful for that.

24th-Mar-2010 08:47 pm - The Mini-DebCamp in Khon Kaen

So, Thailand Mini-DebCamp 2010 in Khon Kaen has already ended. It's another memorable event I've joined, and especially for this one, been in the organizing team. We owed many people for its success.

I'd like to thank our guest DDs for their talks, many of which are improvised. Special thanks to Paul Wise and Yakiharu Yabuki for preparing the talks on Debian Social Contract and Debian packaging in one night, so our audiences can prepare themselves for the Bug Squashing Party in the next 3 consecutive days. Thanks Paul Wise, Andrew Lee, Yakiharu Yabuki, Daiki Ueno, Christian Perrier and our local participants for their efforts in tackling more than 50 bugs during the BSP, 30 of which have been closed and 14 with proposed patches.

Thanks Christian Perrier for several talks in the last two days. We also had Andrew Lee's talk on Debian EzGo project, along with talks from our local distro developers (Linux SIS, Linux TLE) on what are being worked on and what can be pushed into Debian. And Neutron Soutmun had presented some future plan on the RahuNAS, a captive portal software based on Debian.

A special agenda had been arranged to improve Debian mirroring in Thailand. Chatchai Jantaraprim, the ftp.th.debian.org maintainer, had shared us the backgrounds and motivations behind the official mirror setup for Thailand. Andrew Lee, the ftp.tw.debian.org maintainer, had introduced us to the Debian mirroring infrastructure, and encouraged the local mirror Debian mirror maintainers to do it Debian way. We had exchanged experiences and problems found among the current mirror sites, which can be much useful for their improvements, as well as cooperation in the future.

Christian Perrier had also introduced us to the Debian translation workflow and how to coordinate translation via mailing list. This can be useful in the future for Thai if we can form a team, rather than a single-handed translation as present.

Christian's talk on Debian contribution paths, along with their fresh hands-on experience in the BSP, had indeed motivated many local people to join Debian. I've been told by some people that they wanted to actually join Debian after this event, after just having a wish to do so for a long time.

And Christian's yet another talk on key signing, with live demonstration, was really helpful for Thai audiences, as few of us were familiar with the concept and practice. Yes, taking care of PGP key does require special cares!

Night chats and parties were also cool. We enjoyed the drinks (especially, Debian wine!), snacks, and chats together, and exchanged many stories. For me personally, it made me feel Debian as a live community, with people living in it.

For the record, we even had a real bug squashing party, as fried bugs are among well-known Esaan dishes. And we had immediately got new voluntary vegetarians because of it! Hee hee.. Hello Christian, I witnessed it. ;-) And thumbs up to Yabuki for his bravery!

Yes, it's a wonderful event for me indeed. Thanks Khon Kaen University (Kitt Tientanopajai et. al.) for hosting it. Thanks NECTEC Information and Mobile Applications Program and Science Park KKU for the financial support. Thanks NECTEC people for taking care of foreign participants in traveling between Bangkok and Khon Kaen. And thanks all participants for their contributions to make this event a great one!

Picture credits: Supphachoke Suntiwichaya, DebConf Gallery

22nd-Jan-2010 03:33 pm - Thailand Mini-DebCamp 2010

Hello, Planet Debian. I'm pleased to have announced about the first Debian event in Thailand: Thailand MiniDebCamp 2010.

This is a follow-up to Taiwan MiniDebConf 2009 in last September in Taipei. At the end of that event, we discussed about having a mini-DebCamp in Thailand during the New Year holidays. However, with many factors, it has been moved to this March. And the date has been settled on 13-19 March. The place will be Khon Kaen University (KKU) in Khon Kaen, a north-eastern province of Thailand (close to Lao, if you look for it in a map).

In this event, we plan to arrange a Bug Squad Party for the Squeeze release, and some meetings between DDs and local people, to encourage more Debian activities in Thailand and nearby countries.

Thailand has started its GNU/Linux development and usage since no later than 1997. Currently, we have projects like Thai Linux Working Group for upstream Thai resource developments; OpenTLE.Org and thaiopensource.org for local distros (Ubuntu-based); and some communities like ubuntuclub, debianclub, and more others for other distros. The user base is slowly growing, while developer base is yet growing more slowly. We have had domestic events occasionally. But certainly, an international one like this would effectively encourage more prospecters, as well as create closer cooperations with the global projects.

We also hope this event to be a meeting point for developers from nearby regions. And internationalization, among other issues, can be discussed and worked out.

So, please be invited to join and discuss. Please make sure to add your name to the Wiki page so we can prepare things for you, including accommodation. And feel free to propose your agenda in BarCamp style. It will be summarized some time before the event date.

We also have a mailing list for updates and discussions.

8th-Jan-2010 05:18 pm - My Recent Deb Migrations Finished

I've just finished migrations on Debian packages under my maintenance:

  • Update my e-mail address to @debian.org
  • Switch to DebSrc 3.0 format
  • Drop defoma use on font packages
  • Run autoreconf on packaging time for autotools-based packages, instead of just config.{sub,guess} replacement
  • Fix piuparts error for thaixfonts package, which involved fixing Debian #543512 in xfonts-utils and debhelper before rebuilding the package. This also solves the same problem on other xfonts-* packages as well.

The last upload was done on the New Year (2010-01-01). Just have time to blog about it.

When learning to code in C++, I was convinced by the advantage of iostream over C printf, such as better type checking. C++ syntactic features have been used to devise the stream operators and manipulators to match what printf provides. But one important thing is missing: format string localization.

Some examples are shown in GNOME #548950 for Ekiga trunk.

For example, to print how many users are found online, with printf and gettext we get (plural form is omitted here for simplicity):

  printf (_("%d users found\n"), nUsers);

But with iostream we get:

  std::cout << nUsers << " " << _("users found") << std::endl;

In printf case, translators are free to reorder words according to different grammars. For example, in case of Thai, it can be translated like:

  msgid "%d users found\n"
  msgstr "พบผู้ใช้ %d คน\n"

(Literally, the Thai msgstr reads "found users %d persons".)

But this is impossible for iostream case. One must end up with very weird language usage like:

  msgid "users found"
  msgstr "ผู้ใช้ถูกพบ"

(Thai msgstr here literally reads "users are found".)

And "3 ผู้ใช้ถูกพบ" sounds weird and unnatural to Thai readers. That is, with iostream, word orders in messages are tied to English.

I don't know how to solve this with iostream. It seems beyond what C++ syntax can achieve.

When combined with another case I found when working with POSIX file descriptor, for which fstream constructor has been dropped from C++ standard library, using iostream is simply a wrong decision in the first place. And the solution is to migrate the file manipulating codes to plain stdio C library.

Lesson learned: Don't use C++ iostream unless you really have limited use cases.

FOSS is great. It allows any interested party to join and modify software to serve their needs, including local language supports.

In many cultures, including Thai, people are happy reading English messages in applications, but require applications to allow creating and editing "contents" in their own languages. For these cultures, language supports in infrastructure for text input, text rendering and printing, and some internal operations like sorting and text analysis, are essential, while message translation is optional.

So, when asked if Thai is supported by GNOME, I'd answer yes, because, through many development efforts in the past, you can now:

  • Type Thai text in GNOME applications, with input sequence checking, and even correction (with Thai XIM and GTK+ im-xim bridge).
  • See Thai text with quality rendering, with even OpenType font support (thanks to the Pango project).
  • Read Thai text with proper line wrapping, or even word selection, despite the fact that Thai words are not delimited by spaces nor any punctuation marks (thanks to the LibThai project, with the wise API provision by Pango itself).
  • Sort Thai words in applications according to the standard dictionary, as well as other cultural conventions such as date/time format (thanks to the Thai Locale project).

Unfortunately, Thai message translation in GNOME 2.10 is still far from complete. So, it's not listed as a "supported" language. But the fact is that many Thai people just don't care about message translation.

Nothing to complain here. Just a notice that my definition of "language support" might be different from the official one. :-)

11th-Mar-2005 11:19 pm - Welcome to my blog
Welcome to my live journal. This is a first post.
ยินดีต้อนรับสู่ blog ของผม นี่คือการทดสอบ (จาก drivel)
This page was loaded Jun 8th 2025, 9:54 am GMT.