Nafees Nastalique-Character-Based Nastalique Font for Urdu
Project Title:
Nafees Nastalique-Character-Based Nastalique Font for Urdu
Progress Report
Dr. Sarmad Hussain, Mr. Shafiq-ur-Rahman, Mr. Belal Hashmi
Center for Research In Urdu Language Processing
National University of Computer and Emerging Sciences
Lahore, Pakistan
23rd December, 2002
Synthesis (of Project, Progress and Future Work)
Urdu is the national language of Pakistan and has more than 60 million speakers in more than 20 countries. Even with such extensive readership, very limited information is published on internet in Urdu. A significant limiting factor has been absence of a character-based font for Urdu. Urdu is written in Nasta'leeq script which is highly context-sensitive and cannot be realized using earlier font specifications (e.g. true type fonts). Therefore, Urdu websites are made by either using Naskh font (normally used for Arabic, and which is unnatural for Urdu readership) or by putting scanned images of text written in Nastalique (which takes a large amount of memory and makes the websites very slow to access). Thus, to make Urdu web and other publishing more effective and efficient, a character-based Nastalique font for Urdu is being developed through this project. Orthographic analysis of Nastalique font has been conducted. A document detailing this contextual variation and joining rules for Nasta'leeq is included with this report. Currently work is in progress to write the character shapes and vectorize them for inclusion in the font. In parallel, physical modeling of the orthographic rules of Nasta'leeq is also being performed using the OTF formalism. Thorough testing and application development using this font will eventually be performed after the font has been developed. The project is progressing on schedule and will be finished in August 2003.
Research Problem
Urdu is traditionally written and read in Nasta'leeq style. Nasta'leeq font is computationally complex for many reasons. First, each letter has precise writing rules, relative to the length of the flat nib of the pen being used to write, as described in Figure 1 below.
Figure 1: Rules for writing mad, alif, bay, tay and jim letters of Urdu in Nasta'leeq
Second, this cursive font is highly context sensitive. Shape of a letter depends on multiple neighboring characters. The tablet shown in Figure 2 shows different shapes of letter Bay, as it connects with other characters of Urdu. Dots and baseline also add further complexity, as discussed in later sections.

Figure 2: Combining rules for writing Bay with other characters of Urdu in Nasta'leeq, as shown through a Takhti
Due to this complex nature of Nastaliqe style of writing, it had been difficult to model. Therefore, electronic publishing (including Internet) was either done using the easier Naskh font or by using ligature based systems. As character-based (application independent) Nasta'leeq was not available, publishing in Nasta'leeq (especially for Internet) was being done by uploading text images (which had been impeding the widespread usage of Urdu on this medium).
Recent advances in font technology, especially the change form True Type Fonts to Open Type Fonts has now made it possible to model complex cursive writing styles like Nasta'leeq, and therefore to develop character-based application-independent fonts.
This project had two research and development objectives. First, we had proposed to conduct an orthographic analysis of Nasta'leeq and to develop a quantitative model for it. This would set the basis for the second phase, which would implement this model using Open Type Font formalism to develop a character-based Nasta'leeq font for Urdu.
Research Findings
We have successfully gone through the orthographic analysis of Nasta'leeq to the level of detail where now it may be used by us and others to develop character-based fonts for Urdu.
The major findings have been to find the basic repository of shapes of different characters and the contexts in which these shapes occur. In addition, we have also been able to group different characters in classes according to similarity in their contextual behavior. This has enabled us to also formulate a conservative repository of rules defining the cursive behavior of this writing style, aligned with the intuition of the calligrapher.
In addition to contextual shape inventory and rules, we have also analyzed and documented the baseline shifting rules, Nuqta (Mark) placement rules and rules used for proportional spacing of ligatures to keep the writing balanced. Details of these findings are given in the attached document titled "Logical Model of Nasta'leeq for Urdu." This document has been prepared as part of the current project and, as the name suggests, details the logical model of Nasta'leeq writing style. The analysis is limited to the characters found in Urdu.
Parallel prototypical work was also performed on Naskh, which shows lesser contextual variation and single baseline. This work was done because we wanted to test the OTF specification as well. In this prototypical exercise, we were able to define best practices which could be used for implementation of rules within OTF. For example, we have established that TTF level joins should be avoided in OTF formalism. This work will greatly help us in designing and developing a robust Nasta'leeq font, the work which will be started in January 2003.
Finally, the prototypical work also helped point out the current limitations of OTF formalism and implementation of OTF formalism on various platforms.
Fulfillment of Objectives
Currently the project is right on track. We have fulfilled our proposed objective of developing the logical model of Urdu Nasta'leeq. In addition, as earlier stated, we have also found limitations of OTF formalism and its implementations by developing prototypical Naskh OTF font, similar to Nasta'leeq writing style albeit with limited contextual variation. Most of these limitations pertain to aspects of Nasta'leeq writing style which are outside the scope of current work, e.g. mechanism to implement justification and visual balancing in Nasta'leeq writing style. We have taken up documenting these limitations as an additional exercise, as these limitations must be eventually rectified by the concerned standard and solution providers.
We are positive that the second objective of the project, which was to develop and freely distribute an OTF font for Urdu Nasta'leeq, will also be met in the specified time (over next nine months).
Project Design and Implementation
The project was designed to go through four phases of development :
· Orthographic Analysis of Nasta'leeq for Urdu
o In order to develop logical model, Nasta'leeq writing style was discussed in detail with an expert calligrapher to extract the calligrapher's intuition and devise joining and placement rules for it. This work was scheduled to be completed in the 11th month after start of the project (i.e. Feb, 2003). However, we rescheduled the tasks by shifting vectorization of ligatures later after modeling, instead of doing these activities in parallel, as proposed earlier. Therefore, this work has already been completed
o Context-dependent shape inventory has also been finalized
· Nasta'leeq Modeling using OTF Specification
o As per updated project plan (as discussed above), the subtask of Vectorized Font will be completed in February 2003, however the overall task will be completed on scheduled time (June 2003).
· Verification and Validation
o This task is scheduled to be started in June 2003
· Application of Nafees Nasta'leeq
o This task is scheduled to be started in August 2003
Project Outputs, Dissemination and Capacity Building
The major output of this work so far has been the development of a logical model for Nasta'leeq. The only published work on Nasta'leeq has been qualitative, and quantitative work has been imprecise as it has been targeted for human users. Therefore, anyone who wanted to develop a font in Nasta'leeq, would need to do all the work from scratch. This was a significant obstacle in the development of Nasta'leeq fonts. We have documented all the natural classes and associated joining rules for characters in Nasta'leeq and will soon publish this document on our website for free distribution. This will significantly accelerate the font development process and therefore also attract others (including vendors) to invest in this work. Thus, we expect that variations in Nasta'leeq style will also emerge soon after this work. This work will also set the basis of more formal teaching of Nasta'leeq, we have documented in detail, what was normally passed down through the generations of calligraphers (however, this document cannot replace the calligrapher). Therefore, this work will give an impetus to Nasta'leeq and its font development and therefore to internet and other electronic publishing in Urdu. The document will soon be released to public for free though our website.
In addition, our work so far and in future will also test the current OTF formalism and its implementations on various platforms (e.g. Microsoft Windows XP, Unix). This will generate feedback which will help improve the formalism and its implementations.
This project has also helped develop interest in Nasta'leeq and OTF and triggered more research work in this area. For example, three graduate students, who have been working on the project, are continuing on advanced issues of Nasta'leeq and its implementation for their thesis work.
Finally, the project has been instrumental in developing a small albeit trained and effective workforce. The students who have been part of this project are traveling to various relevant seminars being held around Pakistan to give Nasta'leeq font development training to others.
Impact
As this work has not been formally published yet, there is no significant impact. However, it has been informally presented at various forums, which has sparked appreciation and invitations to share the work already done.
Overall Assessment
Overall the project has been extremely effective. In nine months we have completed the orthographic analysis of Nasta'leeq for Urdu. We have also been able to develop manpower with expertise in Nasta'leeq and OTF development. All this has been done at a very minimal cost (half the total project cost, so far). With our current expertise and knowledge, we are confident that we will be able to complete the project and would have developed a character-based Nasta'leeq font for Urdu by the end of the project (in August 2003). With a contribution of US$30,000 by APDIP, we would develop a font which would immensely influence publishing and dissemination of information in Urdu especially on internet. What will make it effective is its free disbursement upon completion. And the ground work on Nasta'leeq (which is also being freely published) will help other researchers and developers to develop their own Nasta'leeq versions much more quickly and with less investment.
|
Last modified 2004-06-03 03:50 PM




