티스토리 뷰

http://www.almaden.ibm.com/cs/people/marksmith/sendmsg.html

Be careful with the
sendmsg() family of functions

While optimizing the network filesystem I work on here at Almaden, I made a somewhat interesting discovery.

The semantics of thread-safety with respect to the sendmsg() family of functions, encompasing send(), sendmsg(), and sendto() are not what one might think. These functions may actually internally interleave data from a send() on one thread with data from a send() on another thread without violating the strict definition of thread-safe. The prevailing wisdom on the web (and I made this mistake as well) confuses "atomic" with "thread-safe" for these functions. It is interesting to note, however, that some operating systems do seem to implement these functions to have atomic semantics.

The POSIX/SUSv3 specification defines "thread-safe" as follows:

"A function that may be safely invoked concurrently by multiple threads. Each function defined in the System Interfaces volume of IEEE Std 1003.1-2001 is thread-safe unless explicitly stated otherwise. Examples are any "pure" function, a function which holds a mutex locked while it is accessing static storage, or objects shared among threads."

It does not explicitly define "atomic", but in the definition of read() in the System Interfaces volume, it gives the following rationale:

"The standard developers considered adding atomicity requirements to a pipe or FIFO, but recognized that due to the nature of pipes and FIFOs there could be no guarantee of atomicity of reads of {PIPE_BUF} or any other size that would be an aid to applications portability. ... I/O is intended to be atomic to ordinary files and pipes and FIFOs. Atomic means that all the bytes from a single operation that started out together end up together, without interleaving from other I/O operations. It is a known attribute of terminals that this is not honored, and terminals are explicitly (and implicitly permanently) excepted, making the behavior unspecified. The behavior for other device types is also left unspecified, but the wording is intended to imply that future standards might choose to specify atomicity (or not)."

What does thread-safety mean with respect to sendmsg()?

The linux man page indicates:

"If space is not available at the sending socket to hold the message to be transmitted, and the socket file descriptor does not have O_NONBLOCK set, send() shall block until space is available."

One might infer that this description along with the guarantee of thread-safety would produce the semantic that if two or more threads call sendmsg() against the same socket descriptor and with different buffers, that, although the order in which those individual buffers are sent cannot be guaranteed, that the contents of those buffers will be sent intact, without any intermixing of the data contained in the buffers.

Unfortunately, one would be wrong.

The following programs implement a condition in which that semantic is shown not to hold.

The server program listens for a tcp connection. When one is established, it forks. The first thread floods one of the family of functions (the example is send()) with a 32KB buffer of zeros repeatedly. The second thread sends a 10-byte packet of ones ten times per second. The client program connects to the server and repeatedly calls recv(), consuming data as quickly as possible, and verifying that it is receiving either 32KB of zeros or 10 bytes of ones.

At some point, the 10 bytes of ones appear in the middle of a 32KB string of zeros and the client reports an error. By using Ethereal one can verify that the bytes are transmitted in this way.

"Thread safe", when applied to this family of functions, simply guarantees that the data in a particular call to send(), if received, will arrive on the correct socket, and that the first byte will arrive before the second byte, the second byte before the third, and so on. It does not mean that another thread may not slip a few bytes between, say, the second and third bytes of the data in this particular call to send().

Why not atomic?

Using Linux kernel 2.6.11 as a reference, because it's the latest kernel processed for easy web cross-referencing at LXR ...

When a connected TCP send(), sendto(), or sendmsg() arrives in the Linux kernel, it eventually comes through tcp_sendmsg(). tcp_sendmsg() protects itself by acquiring a lock at invocation by calling lock_sock(). tcp_sendmsg() then loops over the buffers in the iovec, allocating associated sk_buff's and cache pages for use in the actual send. As it does so, it pushes the data out to tcp for actual transmission. However, if one of those allocation fails (because a large number of large sends is being processed, for example), it must wait for memory to become available. It does so by jumping to wait_for_sndbuf or wait_for_memory, both of which eventually cause a call to sk_stream_wait_memory(). sk_stream_wait_memory() contains a code path that calls sk_wait_event(). Finally, sk_wait_event() contains the call to release_sock().

At this point, any one of the threads that were heretofore serialized at the initial call to lock_sock() in tcp_sendmsg() can proceed. Memory may either become available, or a small enough send may not require enough memory to block and may proceed immediately, thus intermixing data from one call to send() with another.

What to do?

Make sure that you serialize your calls to this family of functions. Serialization must occur at least at a per-socket granularity.

As a big-picture issue, the POSIX/SUSv3 standard developers indicate that atomicity for socket I/O is "unspecified". I cannot think of any condition in which non-atomic semantics would be desired for the sendmsg() family of functions. Certainly, one could argue that the common case is a single thread sending on a given socket, and that therefore sendmsg() should be optimized for that case and should not incur any overhead from maintaining atomicity. However, the Linux kernel is already incurring lock overhead through its call to lock_sock() at the beginning of tcp_sendmsg(). If this lock were used to make the function atomic, the serialization overhead would be moved into the kernel. This would actually make multi-threaded socket applications faster because the code execution required to get from user-space send() to kernel-space tcp_sendmsg() could be done outside of the lock, resulting in less lock contention and more concurrency. A few operating systems seem to implement an atomic send() already, a tabulation of which is provided.

It would be excellent if the Linux kernel or network developers were inclined to make the changes required to give this family of functions atomic semantics as well. One way to achieve this would be to add a finer-grained send-lock to the sock structure. This lock would be acquired at tcp_sendmsg() invocation and released upon its return. Otherwise, it may be possible to wait for socket memory to become available without releasing the existing socket lock.

For Reference:

Note that these functions can only be proven not to be atomic through experimentation. A "Yes" in the "Atomicity" column only indicates that the given operating system did not prove to have a non-atomic implementation of send() as observed when running my example programs.

It is interesting to note that neither Solaris, Windows, nor AIX document atomicity.

Atomicity of sendmsg() family of functions

System Atomicity
AIX 5.2 Yes
FreeBSD 6.1 No
HP-UX B.11.23 No
IRIX 6.5.27 No
Linux 2.6.x No
Mac OS X 10.4 Tiger No
Solaris 10 Yes
Windows 2000/XP Yes, but with caveats