Python使用freetype渲染顯示阿拉伯語

目錄
一、使用場景
二、語言背景
三、環境搭建
四、程序結構
五、代碼

一、使用場景

公司一直以點陣屏顯示爲業務。最近希望替換原有的點陣字庫，轉用FreeType渲染矢量字形，且需要支持阿拉伯語。驗證可行性階段因爲python的各種庫用起來相當舒服，所以就先用Python進行驗證，通過後再轉C++實現。

二、語言背景

阿拉伯語與常規語言不同，它屬於複雜文本語言。它有以下3個特點：
1.閱讀順序從右往左
2.字符在詞前中後有不同寫法
3.帶有修飾符號
這裏非常感謝建國雄心大哥的文章。推薦大家對於阿拉伯文如果不理解可以去他的博客看看。
http://blog.sina.com.cn/s/articlelist_1569506881_0_1.html

因爲阿拉伯文有以上特點，所以不能單純的一個字符一個字符的讀取並渲染，在對字符串渲染之前需要經過一次特殊處理，轉成正確的Unicode碼串後再使用FreeTyoe對轉換後的Unicode碼串進行渲染。

三、環境搭建

這裏需要用到3個Python庫
1.numpy
引用原因：方便對點陣數組的操作, 而且Freetype好像也用到了
安裝命令：pip install numpy
2.freetype-py
引用原因：矢量字體渲染模塊
安裝命令：pip install freetype-py
3.matplotlib
引用原因：顯示渲染結果，如果需要在別的地方顯示，可以不需要
安裝命令：pip install matplotlib

四、程序結構

首先，對多語言的渲染大致分爲兩個模塊，解析模塊和渲染模塊。解析模塊用於處理原始Unicode碼；渲染模塊根據Unicode碼取字模併合成點陣數組。
流程大致如下：原始字符串 -> 解析模塊 -> 處理後Unicode碼串 -> 渲染模塊 -> 點陣數組 -> 顯示。
這裏的解析模塊本人是自己寫，但是推薦使用HarfBuzz。

這裏放個用matplotlib顯示的效果

五、代碼

解析模塊

class ArabicText(object):
    # first, last, middle, alone   
    __arabic_Positions=[[ 0xfe80, 0xfe80, 0xfe80, 0xfe80],         #0x621
        [ 0xfe82, 0xfe81, 0xfe82, 0xfe81], 
        [ 0xfe84, 0xfe83, 0xfe84, 0xfe83],
        [ 0xfe86, 0xfe85, 0xfe86, 0xfe85],
        [ 0xfe88, 0xfe87, 0xfe88, 0xfe87],
        [ 0xfe8a, 0xfe8b, 0xfe8c, 0xfe89],    
        [ 0xfe8e, 0xfe8d, 0xfe8e, 0xfe8d],    
        [ 0xfe90, 0xfe91, 0xfe92, 0xfe8f],    
        [ 0xfe94, 0xfe93, 0xfe94, 0xfe93],    
        [ 0xfe96, 0xfe97, 0xfe98, 0xfe95],    
        [ 0xfe9a, 0xfe9b, 0xfe9c, 0xfe99],    
        [ 0xfe9e, 0xfe9f, 0xfea0, 0xfe9d],    
        [ 0xfea2, 0xfea3, 0xfea4, 0xfea1],    
        [ 0xfea6, 0xfea7, 0xfea8, 0xfea5],    
        [ 0xfeaa, 0xfea9, 0xfeaa, 0xfea9],    
        [ 0xfeac, 0xfeab, 0xfeac, 0xfeab],    
        [ 0xfeae, 0xfead, 0xfeae, 0xfead],    
        [ 0xfeb0, 0xfeaf, 0xfeb0, 0xfeaf],    
        [ 0xfeb2, 0xfeb3, 0xfeb4, 0xfeb1],    
        [ 0xfeb6, 0xfeb7, 0xfeb8, 0xfeb5],    
        [ 0xfeba, 0xfebb, 0xfebc, 0xfeb9],    
        [ 0xfebe, 0xfebf, 0xfec0, 0xfebd],    
        [ 0xfec2, 0xfec3, 0xfec4, 0xfec1],    
        [ 0xfec6, 0xfec7, 0xfec8, 0xfec5],    
        [ 0xfeca, 0xfecb, 0xfecc, 0xfec9],    
        [ 0xfece, 0xfecf, 0xfed0, 0xfecd],    
        [ 0x63b, 0x63b, 0x63b, 0x63b],    
        [ 0x63c, 0x63c, 0x63c, 0x63c],    
        [ 0x63d, 0x63d, 0x63d, 0x63d],    
        [ 0x63e, 0x63e, 0x63e, 0x63e],    
        [ 0x63f, 0x63f, 0x63f, 0x63f],    
        [ 0x640, 0x640, 0x640, 0x640],    
        [ 0xfed2, 0xfed3, 0xfed4, 0xfed1],    
        [ 0xfed6, 0xfed7, 0xfed8, 0xfed5],    
        [ 0xfeda, 0xfedb, 0xfedc, 0xfed9],    
        [ 0xfede, 0xfedf, 0xfee0, 0xfedd],    
        [ 0xfee2, 0xfee3, 0xfee4, 0xfee1],    
        [ 0xfee6, 0xfee7, 0xfee8, 0xfee5],    
        [ 0xfeea, 0xfeeb, 0xfeec, 0xfee9],    
        [ 0xfeee, 0xfeed, 0xfeee, 0xfeed],    
        [ 0xfef0, 0xfef3, 0xfef4, 0xfeef],    
        [0xfef2, 0xfef3, 0xfef4, 0xfef1]]
        
    __preSet = [0x62c, 0x62d, 0x62e, 0x647, 0x639, 0x63a, 0x641, 0x642,
            0x62b, 0x635, 0x636, 0x637, 0x643, 0x645, 0x646, 0x62a,        
            0x644, 0x628, 0x64a, 0x633, 0x634, 0x638, 0x626, 0x640] 
                   
    __nextSet = [0x62c, 0x62d, 0x62e, 0x647, 0x639, 0x63a, 0x641, 0x642,
            0x62b, 0x635, 0x636, 0x637, 0x643, 0x645, 0x646, 0x62a,        
            0x644, 0x628, 0x64a, 0x633, 0x634, 0x638, 0x626,        
            0x627, 0x623, 0x625, 0x622, 0x62f, 0x630, 0x631, 0x632,        
            0x648, 0x624, 0x629, 0x649, 0x640]
    __replaceSet = [[0xFEF5,0xFEF6],[0xFEF7,0xFEF8],[0xFEF9,0xFEFA],[0xFEFB,0xFEFC]]        
    # 將傳入的字符串轉換爲顯示時的數組，顯示時用FreeType直接取數組中的每一個值進行排版顯示即可    
    # 返回前已經將阿拉伯文倒置(阿拉伯文從右往左書寫)    
    @staticmethod    
    def Translate(text):
        retArr = []        
        textLen = len(text)
        lastIdx = -3    # 上一個非阿拉伯字符所在下標        
        begIdxs = []    # 非阿拉伯字符串開始下標集合        
        endIdxs = []    # 非阿拉伯字符串結束下標集合
            
        for i in range(0,textLen):            
            charCode = ord(text[i])            
            # 非阿拉伯語字符直接添加            
            if charCode not in range(0x621,0x6ff):                
                retArr.append(charCode)                
                arrLen = len(retArr)                
                # 不連續                
                if arrLen - 1 != lastIdx + 1:                    
                    begIdxs.append(arrLen - 1)                
                # 最後一個字符是非阿拉伯字符                
                if i == textLen - 1:                    
                    endIdxs.append(arrLen - 1)                
                lastIdx = arrLen - 1                
                continue            
            else:                
                arrLen = len(retArr)                
                # 當前阿拉伯字符的前一個字符是非阿拉伯字符                
                if lastIdx == arrLen-2:                    
                    endIdxs.append(lastIdx)
                    
            #----rule 1----            
            # 前一個字符的Unicode碼            
            preCh = (0 if (i==0) else ord(text[i-1]))    # preCh = i==0 ? 0 : (int)text[i-1]            
            # 當前字符的Unicode碼            
            ch = charCode                               # ch = (int)text[i]            
            # 後一個字符的Unicode碼            
            nextCh = (0 if(i==(textLen-1)) else ord(text[i+1])) # nextCh = i == (textLen-1) ? 0 : (int)text[i+1]
            val = ArabicText.__GetTransform(preCh,ch,nextCh)            
            retArr.append(val)            
            #----rule 2----            
            replace = ArabicText.__GetContinuousWriting(preCh,ch,nextCh)            
            if replace > 0:                
                retArr.append(replace)                
                i = i + 2
        # 阿拉伯文從右往左顯示，所以要把結果反過來        
        retArr.reverse()        
        ArabicText.__NonArabicReverse(retArr,begIdxs,endIdxs)        
        return retArr        
    # 處理非阿拉伯字符，非阿拉伯字符不用反轉，這裏把他們再反回來    
    @classmethod    
    def __NonArabicReverse(cls,charArr=[],begIdxs=[],endIdxs=[]):        
        lastIdx = len(charArr) - 1   # 最後一個下標        
        loopCnt = len(begIdxs)        
        for i in range(0,loopCnt):            
            beg = (lastIdx - endIdxs[i])            
            end = (lastIdx - begIdxs[i])            
            switchTimes = int((end + 1 - beg)/2)            
            for j in range(0,switchTimes):                
                temp = charArr[beg+j]                
                charArr[beg+j] = charArr[end-j]                
                charArr[end-j] = temp
                
    # 處理連寫字符 某些情況下需要將後續兩個字符替換成其他字符    
    @classmethod    
    def __GetContinuousWriting(cls,preCh=0,ch=0,nextCh=0):        
        retVal = 0        
        nextChArr = [0x622,0x623,0x625,0x627]        
        positionIdx = -1        
        charIdx = 0        
        if (ch == 0x644) and (nextCh in nextChArr):            
            charIdx = nextChArr.index(nextCh)            
            if preCh in cls.__preSet:                
                positionIdx = 1            
            else:                
                positionIdx = 0            
            retVal = cls.__replaceSet[charIdx][positionIdx]                
        return retVal
        
    # 處理字符因前連寫後連寫的變形    
    @classmethod    
    def __GetTransform(cls,preCh=0,ch=0,nextCh=0):        
        preConnect = False        
        nextConnect = False        
        positionIdx = -1        
        charIdx = 0
        # 是前連字符        
        if preCh in cls.__preSet:            
            preConnect = True            
            positionIdx = 0
        # 是後連字符        
        if nextCh in cls.__nextSet:            
            nextConnect = True            
            positionIdx = 1
        # 既是前連又是後連，等於在中間        
        if preConnect and nextConnect:            
            positionIdx = 2        
        # 不是前連又不是後連，等於要單獨顯示        
        elif (preConnect == False) and (nextConnect == False):            
            positionIdx = 3
            
        charIdx = ch - 0x621        
        retVal = cls.__arabic_Positions[charIdx][positionIdx]        
        return retVal

入口和渲染模塊

# -*- coding: utf-8 -*-
import freetype
import numpy
import matplotlib.pyplot as plt
import ArabicTextHelper as Arabic
def main():
    text = u'شبح ، شبح الشيوعية ، يتجول في جميع أنحاء القارة الأوروبية'    
    textArr = []    
    # 處理原始字符串，生成轉換後的數組     
    textArr = Arabic.ArabicText.Translate(text=text)
    
    # 顯示轉換後的數組        
    FreeTypeDisplay(textArr,0x33,0xe4,0xff)    

def FreeTypeDisplay(textArr=[],R=255,G=255,B=255):    
    RGB = [('R',numpy.uint8), ('G',numpy.uint8), ('B',numpy.uint8)]    
    face = freetype.Face('Fonts/ARIALUNI.TTF')    
    face.set_char_size( 48*64 )    
    slot = face.glyph
    
    # First pass to compute bbox
    width, height, = 0, 0    
    previous = 0    
    # 計算總寬高    
    for c in textArr:        
        face.load_char(c)        
        bitmap = slot.bitmap        
        height = max(height, (face.size._FT_Size_Metrics.height >> 6))        
        kerning = face.get_kerning(previous, c)        
        width += (slot.advance.x >> 6) + (kerning.x >> 6)        
        previous = c
        
    imgBuf = numpy.zeros((height,width), dtype=numpy.ubyte)    
    colorBuf = numpy.zeros((height,width),dtype=RGB)
    
    # Second pass for actual rendering
    xBeg, yBeg = 0, 0    
    previous = 0    
    # 把每個字添加到imgBuf裏    
    for c in textArr:        
        face.load_char(c)        
        # 可以理解爲校正值        
        descender = (-face.size._FT_Size_Metrics.descender) >> 6        
        bitmap = slot.bitmap        
        #基線到字模頂部的距離        
        top = slot.bitmap_top        
        w = bitmap.width        
        h = bitmap.rows        
        yBeg = height - top - descender        
        # 間隔         
        kerning = face.get_kerning(previous, c)        
        xBeg += (kerning.x >> 6)        
        newChar = numpy.array(bitmap.buffer, dtype='ubyte').reshape(h,w)        
        yEnd = yBeg+h        
        xEnd = xBeg+w        
        # 添加到imgBuf中        
        imgBuf[yBeg:yEnd,xBeg:xEnd] += newChar       
        xBeg += (slot.advance.x >> 6)        
        previous = c
        
    FillColor(imgBuf,colorBuf,R,G,B)
    
    # 顯示imgBuf的內容    
    plt.figure(figsize=(10, 10*imgBuf.shape[0]/float(imgBuf.shape[1])))    
    showing = colorBuf.view(dtype=numpy.uint8).reshape(colorBuf.shape[0],colorBuf.shape[1],3)    
    plt.imshow(showing, interpolation='nearest', origin='upper')    
    plt.xticks([]), plt.yticks([])    
    plt.show()
    
def FillColor(srcBuf,colorBuf,R,G,B):    
    rows = srcBuf.shape[0]    
    columns = srcBuf.shape[1]    
    for y in range(0,rows):        
        for x in range(0,columns):            
            if srcBuf[y][x] > 0:                
                colorBuf[y][x] = (R,G,B)

if __name__ == '__main__':    
    main()

參考文章
https://www.cnblogs.com/8335IT/p/8053850.html
https://blog.csdn.net/wuxinyanzi/article/details/12912533

Python使用freetype渲染顯示阿拉伯語

一、使用場景

二、語言背景

三、環境搭建

四、程序結構

五、代碼

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

Flask入坑記錄(二) 整理項目結構

EFCore實現數據庫水平分表的方法

Asp.net core 發佈部署模式的選擇

ASP.net Core 2.2中Jwt驗證的使用方法及在微信小程序上應用

Asp.net core 2.2項目遷移3.0過程記錄

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Python使用freetype渲染顯示阿拉伯語

一、使用場景

二、語言背景

三、環境搭建

四、程序結構

五、 代碼

五、代碼