I don't know C

C is purer than C++. It does not have so many obscure features and ambiguous grammar. I understand these facts and thought there would be nothing more to learn about the C language itself. This was almost true in my mind until I met some open-source projects coded in C, i.e., x264 and ffmpeg. In this article, I will not talk about the x264 techniques but only the C language.

A colleague poked me yesterday and asked how to read the array structure below, which was originally found in x264 implementation. I edited it for explanation:

int16_t (*mv[2][2])[2];

For me this kind of presentation of array structure declaration was seen rarely. I paused for a few seconds and recalled a spirial rule I had learnt in college (probably 5 years ago). Back to that time I did not pay much attention to that because I could not understand it due to lack of coding experience. I did not manage to decipher it in a way both of us could understand at first and thus I went through the spirial rule.

So to speak in spirial rule, we may draw it in such way:

                     +-----------+
                     | +---+     |
                     | ^   |     |
            int16_t (*mv[2][2])[2];
             ^       ^     |     |
             |       +-----+     |
             +-------------------+

In speaking, it could be explained in the following English statement:

mv is a 2x2 2D array of pointers to int16_t[2].

It may still unclear to understand. I extend it in this way:

mv is a 2x2 2D array. Each of the array element is a pointer. Each pointer is pointing to one int16_t[2] element.

For now I think it will not be that wired to see why x264 accesses mv in, for example mv[0][1][6376][1], patterns.


In debugging we found that x264 sometimes use negative indexes in an array. e.g.:

int t = some_random_int_array[-1]; 

it is like why the hell can indexes are negative? However it turns out to be totally legal and not like python, negative indexes indicate elements before the first element of the array. This is because the pattern array[idx] is equivalent to *(array + idx). This SO thread explains and quotes the following from C99 §6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).


The story of learning new facts wen on and then I met designated initializers but I do not want to repeat every details of the specification here. As an short example in ffmpeg, I saw this:

AVCodec ff_libx264_encoder = {
    .name             = "libx264",
    .long_name        = NULL_IF_CONFIG_SMALL("libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"),
    .type             = AVMEDIA_TYPE_VIDEO,
    .id               = AV_CODEC_ID_H264,
    .priv_data_size   = sizeof(X264Context),
    .init             = X264_init,
    .encode2          = X264_frame,
    .close            = X264_close,
    .capabilities     = CODEC_CAP_DELAY | CODEC_CAP_AUTO_THREADS,
    .priv_class       = &x264_class,
    .defaults         = x264_defaults,
    .init_static_data = X264_init_static,
};

I can guess what is the dot variable name is about, but did not ever imagine C can do something like this!

These all kinds of both new/old facts refreshed my attitude towards C. I knew C++ is a language hard to master all the details, but I have always underestimated C as well. Language is evolving itself all the time even for C.


References

The ``Clockwise/Spiral Rule’’

Negative array indexes in c

Designated Initializers

简易facedetect库

这两天做了一个Cascade Classifier人脸检测的项目,放到了Github上。

主要功能有:

  1. 预先载入进内存的cascade文件
  2. 读取I420图片和视频的一些小工具
  3. API设计成用户直接给定的图片数据的原始指针/图片颜色格式/图片大小
  4. OpenCL支持

实现起来很直接,不过有几点挺有意思的,比较值得注意一下:

####xml2header.cmake

cascade文件预读进内存的思想是用项目里的xml2header.cmake脚本处理cascade的xml文件,生成一个含有长字符串的.h头文件,然后.cpp文件引用它。 这里有个问题就是,有的xml文件很大,比如常用的haarcascade_frontalface_alt.xml。这个文件如果直接编译成一个静态的长字符串,编译器很可能会出错。因此,我在cmake脚本里对这个文件切割成几个小的std::string,然后在程序初始化时用std::accumulate函数再组成完整的cascade字符串。另外要注意,读取成字符串的时候要把文件中的\\\转化成\\\\\\,每一行结尾要再加一个\n

####读取视频源

测试中我使用了两种I420视频源,一种是有header的.y4m格式,一种是没有header的.yuv格式文件。对于.y4m,我们可以参考网上对于y4m格式的介绍来逐帧读取。

####从内存中读取cascade字符串 处理cascade字符串时,我们可以用FileStorage创建一个流,然后给OpenCV的cv::CascadeClassifier类的read使用。不过实现过程中我发现read函数只支持新的Cascade文件 - 通过traincascade训练而来的,参考OpenCV API的文档- 为了绕过这一点,我重写了load函数的其中一小部分,这样老的cascade文件也能从内存里读取了。

gdb/Eclipse调试C++STL库容器的美化方法

最近一段时间需要在Ubuntu上做项目。为了方便开发,使用了Eclipse的C++插件来帮助调试。可是日常使用时经常遇到一个很麻烦的问题,Eclipse的调试器(也就是gdb)对C++的STL库的支持很差。比如我想查看一个std::vector的内容,用Visual Studio的调试器可以很方便的看到这个容器的大小和每个元素的值,微软甚至提供给用户自定义调试器显示容器内容的方法;不过,默认情况下,Eclipse/gdb就会显示下面这一陀对调试用处不大的东西:

bar {...}
    std::_Vector_base<TSample<MyTraits>, std::allocator<TSample<MyTraits> > >
        _M_impl {...}   
            std::allocator<TSample<MyTraits> >  {...}   
            _M_start    0x00007ffff7fb5010  
            _M_finish   0x00007ffff7fd4410  
            _M_end_of_storage   0x00007ffff7fd5010

于是乎xp在SO上找到了个解决方案。这里要借助一个叫做_Python libstdc++ printers_的插件来实现美化功能。

1. 安装python2.7和python-gdb

$> sudo apt-get install python2.7
$> sudo apt-get install gdb python2.7-dbg

2. 下载Python libstdc++ printers代码。

$> mkdir ~/python_printer
$> cd ~/python_printer
$> svn co svn://gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/python

3. 修改并添加以下脚本gdb配置文件~/.gdbinit,如果没有就创建一个。这个以我的为例:

python
import sys
sys.path.insert(0, '/home/pengx17/python_printer/python')
from libstdcxx.v6.printers import register_libstdcxx_printers
register_libstdcxx_printers (None)
end

4. 修改Eclipse的gdb配置文件路径。

修改Run->Debug Configurations...->DebuggerGDB command file/home/pengx17/.gdbinit

完成!\o/

调用空指针对象函数

前两天遇到一个挺有意思的问题:

已知有一个class A的实例,A有一个函数func,但不知道A的具体声明和定义。 如果我们有一个ANULL指针A *a = NULL,如果调用a->func()的话,可能会出现什么情况呢?

先不管调用空指针是否是未定义行为。我们从C++语言本身角度去考虑,这样调用是有可能不抛出异常的。

我总结了几个不同的情况,如下(Visual Studio 2012):

#include <iostream>
#include <Windows.h>
#include <exception>

using namespace std;

class A
{
public:
    void func()
    {
        cout << "wtf?" << endl;
    }
    void func_this()
    {
        cout << "wtf: " << this->data << endl;
    }
    static void func_static()
    {
        cout << "static wtf?" << endl;
    }
    virtual void func_virtual()
    {
        cout << "virtual wtf?" << endl;
    }
    A():data(0){}
    int data;
};

int main()
{
    A *a = NULL;
    a->func();
    __try
    {
        a->func_this();
    }
    __except(EXCEPTION_EXECUTE_HANDLER)
    {
        cout << "cannot invoke func_this" << endl;
    }
    a->func_static();
    __try
    {
        a->func_virtual();
    }
    __except(EXCEPTION_EXECUTE_HANDLER)
    {
        cout << "cannot invoke func_virtual" << endl;
    }
    return 0;
}

命令行输出结果为:

wtf?
cannot invoke func_this
static wtf?
cannot invoke func_virtual

####分析

我们来依次分析一下能正常运行的func()func_static():

  • 调用func()函数时A指针不是必须的。在编译时,A类型已知,func()函数指针已经可以确认了。
  • 同理,调用静态函数func_static()也不需要实际的实例对象。

对于抛出异常的func_this()func_virtual():

  • func_this()用到了this指针,而this在这样的情况下是NULL,所以会抛出异常。
  • 而调用虚函数func_virtual()时,我们需要一个可用的虚函数表(vtable)指针,但显然这个指针是拿不到的,因此抛出异常。

不过,实际开发中要尽量避免这种情况哟。

####参考

Why does calling method through null pointer “work” in C++?

C++, __try and try/catch/finally