openFOAM代码阅读——基础代码中的wchar-卡核

路径src/OpenFOAM/primitives/chars中还有另外一个文件夹wchar，我们这里来看看这里面有什么。

头文件wchar.H的内容如下：

#include <cwchar>
#include <string>

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

namespace Foam
{

class Istream;
class Ostream;

// * * * * * * * * * * * * * * * IOstream Operators  * * * * * * * * * * * * //

//- Output wide character (Unicode) as UTF-8
Ostream& operator<<(Ostream&, const wchar_t);

//- Output wide character (Unicode) string as UTF-8
Ostream& operator<<(Ostream&, const wchar_t*);

//- Output wide character (Unicode) string as UTF-8
Ostream& operator<<(Ostream&, const std::wstring&);


// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

} // End namespace Foam

主体内容和前面一个博客中的char文件夹类似，都是给Istream和Ostream进行了运算符>>和<<的重定义，这里的注释提示了是针对output wide character (Unicode) as UTF-8，就是在输出的同时还需要转码成UTF-8。

我们再来看看这三个运算符重定义的具体实现，即wcharIO.C文件：

#include "error.H"

#include "wchar.H"
#include "IOstreams.H"

// * * * * * * * * * * * * * * * IOstream Operators  * * * * * * * * * * * * //

Foam::Ostream& Foam::operator<<(Ostream& os, const wchar_t wc)
{
    if (!(wc & ~0x0000007F))
    {
        // 0x00000000 - 0x0000007F: (1-byte output)
        // 0xxxxxxx
        os.write(char(wc));
    }
    else if (!(wc & ~0x000007FF))
    {
        // 0x00000080 - 0x000007FF: (2-byte output)
        // 110bbbaa 10aaaaaa
        os.write(char(0xC0 | ((wc >> 6) & 0x1F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x0000FFFF))
    {
        // 0x00000800 - 0x0000FFFF: (3-byte output)
        // 1110bbbb 10bbbbaa 10aaaaaa
        os.write(char(0xE0 | ((wc >> 12) & 0x0F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x001FFFFF))
    {
        // 0x00010000 - 0x001FFFFF: (4-byte output)
        // 11110ccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xF0 | ((wc >> 18) & 0x07)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x03FFFFFF))
    {
        // 0x00200000 - 0x03FFFFFF: (5-byte output)
        // 111110dd 10cccccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xF8 | ((wc >> 24) & 0x03)));
        os.write(char(0x80 | ((wc >> 18) & 0x3F)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else if (!(wc & ~0x7FFFFFFF))
    {
        // 0x04000000 - 0x7FFFFFFF: (6-byte output)
        // 1111110d 10dddddd 10cccccc 10ccbbbb 10bbbbaa 10aaaaaa
        os.write(char(0xFC | ((wc >> 30) & 0x01)));
        os.write(char(0x80 | ((wc >> 24) & 0x3F)));
        os.write(char(0x80 | ((wc >> 18) & 0x3F)));
        os.write(char(0x80 | ((wc >> 12) & 0x3F)));
        os.write(char(0x80 | ((wc >> 6) & 0x3F)));
        os.write(char(0x80 | ((wc) & 0x3F)));
    }
    else
    {
        // according to man page utf8(7)
        // the Unicode standard specifies no characters above 0x0010FFFF,
        // so Unicode characters can only be up to four bytes long in UTF-8.

        // report anything unknown/invalid as replacement character U+FFFD
        os.write(char(0xEF));
        os.write(char(0xBF));
        os.write(char(0xBD));
    }

    os.check("Ostream& operator<<(Ostream&, const wchar_t)");
    return os;
}


Foam::Ostream& Foam::operator<<(Ostream& os, const wchar_t* wstr)
{
    if (wstr)
    {
        for (const wchar_t* iter = wstr; *iter; ++iter)
        {
            os  << *iter;
        }
    }

    return os;
}


Foam::Ostream& Foam::operator<<(Ostream& os, const std::wstring& wstr)
{
    for
    (
        std::wstring::const_iterator iter = wstr.begin();
        iter != wstr.end();
        ++iter
    )
    {
        os  << *iter;
    }

    return os;
}

哇，略长。首先是error.H头文件，在路径src/OpenFOAM/db/error中，但是我们暂时不看这里，IOstreams.H也先忽略。

其中的wc变量，可以百度一下https://baike.baidu.com/item/wchar_t/8562830?fr=aladdin。他是char类型定义一个扩展表达，可以表示更多的字符类型，代价是需要用更多的字节数，且在不同的库中可能会用不同的字节数，最多为4字节。

我们知道char是1字节，所以转换之前，我们需要先判断是否超过1字节。判断方法用到了位运算。0x0000 007F是一个十六进制数，恰好四个字节，转换成二进制数为：

0b00000000,00000000,00000000,01111111

再来看判断语句!(wc & ~0x0000007F)，去反后用&进行按位与运算，即和如下进行与运算

0b11111111,11111111,11111111,10000000

就是说只要wc的二进制编码中超过7位非零，上述按位与运算就是非零的，再用!取反之后，得到判断结果为0。就是说这个判断可以判断wc是否会超出1字节。

后面的几个判断是类似的，超出的部分会分成多个char进行输出（这样也行？？？）

然后再看程序中的另外几个运算符重定义，其实也是多态，只不过将输入值分别给成wchar_t数组的首地址，或者是std::wstring，这个参考这篇帖子https://blog.csdn.net/qq_28388835/article/details/81172675，相当于用wchar_t类型定义的string。

比较有趣的是当前的wchar_t数组使用到了迭代器，这个用法在STL中经常出现，但是直接给数组使用的还是第一次见。可能openFOAM内部实现了数组的迭代器（这个就很强了），在《Essential C++》中介绍STL之前也实现过类似的功能，这对编程的技巧要求非常高。

文章版权归作者所有，未经允许请勿转载。

THE END