
原文Why size_t matters


Numerous functions in the Standard C library accept arguments or return values that represent object sizes in bytes. For example, the lone argument in malloc(n) specifies the size of the object to be allocated, and the last argument in memcpy(s1, s2, n) specifies the size of the object to be copied. The return value of strlen(s) yields the length of (the number of characters in) null-terminated character array s excluding the null character, which isn’t exactly the size of s, but it’s in the ballpark.

在标准C库中,许多函数接收参数,或者返回对象的字节大小。例如,malloc(n)函数中,唯一的实参n指定要分配对象的大小;memcpy(s1, s2, n)函数中,最后一个实参n指定要拷贝对象的大小。还有strlen(s)函数的返回值得到的是数组s中以NULL结尾的非空字符的个数(不包括NULL在内),当然这不是s的真正长度,但这是可以接受的。

You might reasonably expect these parameters and return types that represent sizes to be declared with type int (possibly long and/or unsigned), but they aren’t. Rather, the C standard declares them as type size_t. According to the standard, the declaration for malloc should appear in <stdlib.h> as something equivalent to:


void *malloc(size_t n);

and the declarations for memcpy and strlen should appear in <string.h> looking much like:


void *memcpy(void *s1, void const *s2, size_t n);size_t strlen(char const *s);

The type size_t also appears throughout the C++ standard library. In addition, the C++ library uses a related symbol size_type, possibly even more than it uses size_t.


In my experience, most C and C++ programmers are aware that the standard libraries use size_t, but they really don’t know what size_t represents or why the libraries use size_t as they do. Moreover, they don’t know if and when they should use size_t themselves.


In this column, I’ll explain what size_t is, why it exists, and how you should use it in your code.



Classic C (the early dialect of C described by Brian Kernighan and Dennis Ritchie in The C Programming Language, Prentice-Hall, 1978) didn’t provide size_t. The C standards committee introduced size_t to eliminate a portability problem, illustrated by the following example.

传统的C(早期在The C Programming Language, Prentice-Hall, 1978一书中,Brian Kernighan和Dennis Ritchie对C的描述)并没有提供size_t。后来C标准委员会提出size_t来解决可移植性问题,如以下这个例子。

Let’s examine the problem of writing a portable declaration for the standard memcpy function. We’ll look at a few different declarations and see how well they work when compiled for different architectures with different-sized address spaces and data paths.


Recall that calling memcpy(s1, s2, n) copies the first n bytes from the object pointed to by s2 to the object pointed to by s1, and returns s1. The function can copy objects of any type, so the pointer parameters and return type should be declared as "pointer to void." Moreover, memcpy doesn’t modify the source object, so the second parameter should really be "pointer to const void." None of this poses a problem.

调用memcpy()函数,会把s2指向的对象的前n个字节拷贝到s1所指向的对象中,并返回s1。这个函数可以拷贝任意类型的对象,所以指针形参和返回类型应该声明为“指向void的指针”。同时,memcpy()不能修改源对象,所以第二个形参应该为“指向const void的指针”。这些都不会引起问题。

The real concern is how to declare the function’s third parameter, which represents the size of the source object. I suspect many programmers would choose plain int, as in:


void *memcpy(void *s1, void const *s2, int n);

which works fine most of the time, but it’s not as general as it could be. Plain int is signed–it can represent negative values. However, sizes are never negative. Using unsigned int instead of int as the type of the third parameter lets memcpy copy larger objects, at no additional cost.




