44
55This file documents the CONFIG_PACKET_MMAP option available with the PACKET
66socket interface on 2.4 and 2.6 kernels. This type of sockets is used for
7- capture network traffic with utilities like tcpdump or any other that uses
8- the libpcap library.
9-
10- You can find the latest version of this document at
7+ capture network traffic with utilities like tcpdump or any other that needs
8+ raw access to network interface.
119
10+ You can find the latest version of this document at:
1211 http://pusa.uv.es/~ulisses/packet_mmap/
1312
14- Please send me your comments to
13+ Howto can be found at:
14+ http://wiki.gnu-log.net (packet_mmap)
1515
16+ Please send your comments to
1617 Ulisses Alonso Camaró <
[email protected] >
18+ 1719
1820-------------------------------------------------------------------------------
1921+ Why use PACKET_MMAP
@@ -25,19 +27,24 @@ to capture each packet, it requires two if you want to get packet's
2527timestamp (like libpcap always does).
2628
2729In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
28- configurable circular buffer mapped in user space. This way reading packets just
29- needs to wait for them, most of the time there is no need to issue a single
30- system call. By using a shared buffer between the kernel and the user
31- also has the benefit of minimizing packet copies.
32-
33- It's fine to use PACKET_MMAP to improve the performance of the capture process,
34- but it isn't everything. At least, if you are capturing at high speeds (this
35- is relative to the cpu speed), you should check if the device driver of your
36- network interface card supports some sort of interrupt load mitigation or
37- (even better) if it supports NAPI, also make sure it is enabled.
30+ configurable circular buffer mapped in user space that can be used to either
31+ send or receive packets. This way reading packets just needs to wait for them,
32+ most of the time there is no need to issue a single system call. Concerning
33+ transmission, multiple packets can be sent through one system call to get the
34+ highest bandwidth.
35+ By using a shared buffer between the kernel and the user also has the benefit
36+ of minimizing packet copies.
37+
38+ It's fine to use PACKET_MMAP to improve the performance of the capture and
39+ transmission process, but it isn't everything. At least, if you are capturing
40+ at high speeds (this is relative to the cpu speed), you should check if the
41+ device driver of your network interface card supports some sort of interrupt
42+ load mitigation or (even better) if it supports NAPI, also make sure it is
43+ enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
44+ supported by devices of your network.
3845
3946--------------------------------------------------------------------------------
40- + How to use CONFIG_PACKET_MMAP
47+ + How to use CONFIG_PACKET_MMAP to improve capture process
4148--------------------------------------------------------------------------------
4249
4350From the user standpoint, you should use the higher level libpcap library, which
@@ -57,7 +64,7 @@ the low level details or want to improve libpcap by including PACKET_MMAP
5764support.
5865
5966--------------------------------------------------------------------------------
60- + How to use CONFIG_PACKET_MMAP directly
67+ + How to use CONFIG_PACKET_MMAP directly to improve capture process
6168--------------------------------------------------------------------------------
6269
6370From the system calls stand point, the use of PACKET_MMAP involves
@@ -66,6 +73,7 @@ the following process:
6673
6774[setup] socket() -------> creation of the capture socket
6875 setsockopt() ---> allocation of the circular buffer (ring)
76+ option: PACKET_RX_RING
6977 mmap() ---------> mapping of the allocated buffer to the
7078 user process
7179
@@ -96,14 +104,76 @@ Next I will describe PACKET_MMAP settings and it's constraints,
96104also the mapping of the circular buffer in the user process and
97105the use of this buffer.
98106
107+ --------------------------------------------------------------------------------
108+ + How to use CONFIG_PACKET_MMAP directly to improve transmission process
109+ --------------------------------------------------------------------------------
110+ Transmission process is similar to capture as shown below.
111+
112+ [setup] socket() -------> creation of the transmission socket
113+ setsockopt() ---> allocation of the circular buffer (ring)
114+ option: PACKET_TX_RING
115+ bind() ---------> bind transmission socket with a network interface
116+ mmap() ---------> mapping of the allocated buffer to the
117+ user process
118+
119+ [transmission] poll() ---------> wait for free packets (optional)
120+ send() ---------> send all packets that are set as ready in
121+ the ring
122+ The flag MSG_DONTWAIT can be used to return
123+ before end of transfer.
124+
125+ [shutdown] close() --------> destruction of the transmission socket and
126+ deallocation of all associated resources.
127+
128+ Binding the socket to your network interface is mandatory (with zero copy) to
129+ know the header size of frames used in the circular buffer.
130+
131+ As capture, each frame contains two parts:
132+
133+ --------------------
134+ | struct tpacket_hdr | Header. It contains the status of
135+ | | of this frame
136+ |--------------------|
137+ | data buffer |
138+ . . Data that will be sent over the network interface.
139+ . .
140+ --------------------
141+
142+ bind() associates the socket to your network interface thanks to
143+ sll_ifindex parameter of struct sockaddr_ll.
144+
145+ Initialization example:
146+
147+ struct sockaddr_ll my_addr;
148+ struct ifreq s_ifr;
149+ ...
150+
151+ strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
152+
153+ /* get interface index of eth0 */
154+ ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
155+
156+ /* fill sockaddr_ll struct to prepare binding */
157+ my_addr.sll_family = AF_PACKET;
158+ my_addr.sll_protocol = ETH_P_ALL;
159+ my_addr.sll_ifindex = s_ifr.ifr_ifindex;
160+
161+ /* bind socket to eth0 */
162+ bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
163+
164+ A complete tutorial is available at: http://wiki.gnu-log.net/
165+
99166--------------------------------------------------------------------------------
100167+ PACKET_MMAP settings
101168--------------------------------------------------------------------------------
102169
103170
104171To setup PACKET_MMAP from user level code is done with a call like
105172
173+ - Capture process
106174 setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
175+ - Transmission process
176+ setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
107177
108178The most significant argument in the previous call is the req parameter,
109179this parameter must to have the following structure:
@@ -117,11 +187,11 @@ this parameter must to have the following structure:
117187 };
118188
119189This structure is defined in /usr/include/linux/if_packet.h and establishes a
120- circular buffer (ring) of unswappable memory mapped in the capture process.
190+ circular buffer (ring) of unswappable memory.
121191Being mapped in the capture process allows reading the captured frames and
122192related meta-information like timestamps without requiring a system call.
123193
124- Captured frames are grouped in blocks. Each block is a physically contiguous
194+ Frames are grouped in blocks. Each block is a physically contiguous
125195region of memory and holds tp_block_size/tp_frame_size frames. The total number
126196of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because
127197
@@ -336,6 +406,7 @@ struct tpacket_hdr). If this field is 0 means that the frame is ready
336406to be used for the kernel, If not, there is a frame the user can read
337407and the following flags apply:
338408
409+ +++ Capture process:
339410 from include/linux/if_packet.h
340411
341412 #define TP_STATUS_COPY 2
@@ -391,6 +462,37 @@ packets are in the ring:
391462It doesn't incur in a race condition to first check the status value and
392463then poll for frames.
393464
465+
466+ ++ Transmission process
467+ Those defines are also used for transmission:
468+
469+ #define TP_STATUS_AVAILABLE 0 // Frame is available
470+ #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
471+ #define TP_STATUS_SENDING 2 // Frame is currently in transmission
472+ #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
473+
474+ First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
475+ packet, the user fills a data buffer of an available frame, sets tp_len to
476+ current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
477+ This can be done on multiple frames. Once the user is ready to transmit, it
478+ calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
479+ forwarded to the network device. The kernel updates each status of sent
480+ frames with TP_STATUS_SENDING until the end of transfer.
481+ At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
482+
483+ header->tp_len = in_i_size;
484+ header->tp_status = TP_STATUS_SEND_REQUEST;
485+ retval = send(this->socket, NULL, 0, 0);
486+
487+ The user can also use poll() to check if a buffer is available:
488+ (status == TP_STATUS_SENDING)
489+
490+ struct pollfd pfd;
491+ pfd.fd = fd;
492+ pfd.revents = 0;
493+ pfd.events = POLLOUT;
494+ retval = poll(&pfd, 1, timeout);
495+
394496--------------------------------------------------------------------------------
395497+ THANKS
396498--------------------------------------------------------------------------------
0 commit comments