Decoding AF_Netlink Packets from strace

Decoding AF_Netlink Packets from strace

Recently I came across a problem where the openstack neutron linuxbridge-agent was executing bridge fdb show $dev To update its internal database, but these processes were not returning, they instead they were stacking up and increasing the load on the server. Stracing a sample of the processes, a colleague and I noticed that the processes were all blocked on receiving from an AF_NETLINK socket. However they were not hung, they were intermittently receiving packets.

I wanted to know if the contents of these packets could help us determine why the bridge processes were not returning so looked for a way to decode them. Turns out its actually quite easy as there is a python module for this called pyroute2.

Heres an example recieve message line from strace I used -s10000 used to prevent truncation but have truncated this packet for display.

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"t\4\0\0\20\0\2\0\20 \351V\317L\0\0\0\0\1\0\4\0\0\0C\20\1\0\0\0\0\0\t\0\3\0eth2\0\0\0\0\10\0\r\
0\350\3\0\0\5\0\20\0\6\0\0\0\5\0\21\0\0\0\0\0\10\0\4\0\334\5\0\0\10\0\33\0\0\0\0\0\10\0\36\0\0\0\0\0\10\0\37\0\1\0\0\0\10\0 \0\1\0\0\0\5\0!\0\1\0\0\0\17\0\6\0pfifo_fast\0\0$\0\16\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\n\0\1\0\274vN\10\245\263\0\0\n\0\2\0\377\377\377\377\377\377\0\0`\0\7\0\0\0\0\0\n\0\0\0\0\0\0\0\10\3\0\0...", 16384}], msg_controllen=0, msg_flags=0}, 0) = 3420

A fascinating read, I’m sure you’ll agree. Heres the python snippet for decoding and the output:

raw_packet="t\4\0\0\20\0\2\0\20 \351V\317L\0\0\0\0\1........snipped"
import pyroute2.netlink.rtnl.iprsocket
marshal = pyroute2.netlink.rtnl.iprsocket.MarshalRtnl()
marshall.parse(raw_packet)
[{'__align': 0,
'attrs': [['IFLA_IFNAME', 'eth2'],
['IFLA_TXQLEN', 1000],
['IFLA_OPERSTATE', 'UP'],
['IFLA_LINKMODE', 0],
['IFLA_MTU', 1500],
['IFLA_GROUP', 0],
['IFLA_PROMISCUITY', 0],
['IFLA_NUM_TX_QUEUES', 1],
['IFLA_NUM_RX_QUEUES', 1],
['IFLA_CARRIER', 1],
['IFLA_QDISC', 'pfifo_fast'],
['IFLA_MAP',
{'base_addr': 0,
'dma': 0,
'irq': 0,
'mem_end': 0,
'mem_start': 0,
'port': 0}],
['IFLA_ADDRESS', 'bc:76:4e:08:a5:b3'],
['IFLA_BROADCAST', 'ff:ff:ff:ff:ff:ff'],
['IFLA_STATS',
{'collisions': 0,
'multicast': 0,
'rx_bytes': 0,
'rx_compressed': 0,
'rx_crc_errors': 0,
'rx_dropped': 0,
'rx_errors': 0,
'rx_fifo_errors': 0,
'rx_frame_errors': 0,
'rx_length_errors': 0,
'rx_missed_errors': 0,
'rx_over_errors': 0,
'rx_packets': 0,
'tx_aborted_errors': 0,
'tx_bytes': 776,
'tx_carrier_errors': 0,
'tx_compressed': 0,
'tx_dropped': 0,
'tx_errors': 0,
'tx_fifo_errors': 0,
'tx_heartbeat_errors': 0,
'tx_packets': 10,
'tx_window_errors': 0}],
['IFLA_STATS64',
{'collisions': 0,
'multicast': 0,
'rx_bytes': 0,
'rx_compressed': 0,
'rx_crc_errors': 0,
'rx_dropped': 0,
'rx_errors': 0,
'rx_fifo_errors': 0,
'rx_frame_errors': 0,
'rx_length_errors': 0,
'rx_missed_errors': 0,
'rx_over_errors': 0,
'rx_packets': 0,
'tx_aborted_errors': 0,
'tx_bytes': 776,
'tx_carrier_errors': 0,
'tx_compressed': 0,
'tx_dropped': 0,
'tx_errors': 0,
'tx_fifo_errors': 0,
'tx_heartbeat_errors': 0,
'tx_packets': 10,
'tx_window_errors': 0}],
['IFLA_NUM_VF', 0],
['IFLA_AF_SPEC',
{'attrs': [['AF_INET',
{'accept_local': 0,
'accept_redirects': 1,
'accept_source_route': 1,
'arp_accept': 0,
'arp_announce': 0,
'arp_ignore': 0,
'arp_notify': 0,
'arpfilter': 0,
'bootp_relay': 0,
'dummy': 65652,
'force_igmp_version': 0,
'forwarding': 1,
'igmpv2_unsolicited_report_interval': 10000,
'igmpv3_unsolicited_report_interval': 1000,
'log_martians': 0,
'mc_forwarding': 0,
'medium_id': 0,
'nopolicy': 0,
'noxfrm': 0,
'promote_secondaries': 0,
'proxy_arp': 0,
'proxy_arp_pvlan': 0,
'route_localnet': 0,
'rp_filter': 1,
'secure_redirects': 1,
'send_redirects': 1,
'shared_media': 1,
'src_vmark': 0,
'tag': 0}],
['AF_INET6',
{'attrs': [['IFLA_INET6_FLAGS', 2147483664],
['IFLA_INET6_CACHEINFO',
{'max_reasm_len': 65535,
'reachable_time': 40552,
'retrans_time': 1000,
'tstamp': 909}],
['IFLA_INET6_CONF',
{'accept_dad': 1,
'accept_ra': 1,
'accept_ra_defrtr': 1,
'accept_ra_pinfo': 1,
'accept_ra_rt_info_max_plen': 0,
'accept_ra_rtr_pref': 1,
'accept_redirects': 1,
'accept_source_route': 0,
'autoconf': 1,
'dad_transmits': 1,
'disable_ipv6': 0,
'force_mld_version': 0,
'force_tllao': 0,
'forwarding': 0,
'hop_limit': 64,
'max_addresses': 16,
'max_desync_factor': 600,
'mc_forwarding': 0,
'mtu': 1500,
'ndisc_notify': 0,
'optimistic_dad': 0,
'proxy_ndp': 0,
'regen_max_retry': 3,
'router_probe_interval': 60000,
'router_solicitation_delay': 1000,
'router_solicitation_interval': 4000,
'router_solicitations': 3,
'temp_prefered_lft': 86400,
'temp_valid_lft': 604800,
'use_tempaddr': 2}],
['IFLA_INET6_STATS',
{'cepkts': 0,
'csumerrors': 0,
'ect0pkts': 0,
'ect1pkts': 0,
'fragcreates': 0,
'fragfails': 0,
'fragoks': 0,
'inaddrerrors': 0,
'inbcastoctets': 0,
'inbcastpkts': 0,
'indelivers': 0,
'indiscards': 0,
'inhdrerrors': 0,
'inmcastoctets': 0,
'inmcastpkts': 0,
'innoroutes': 0,
'inoctets': 0,
'inpkts': 0,
'intoobigerrors': 0,
'intruncatedpkts': 0,
'inunknownprotos': 0,
'noectpkts': 0,
'num': 36,
'outbcastoctets': 0,
'outbcastpkts': 0,
'outdiscards': 0,
'outforwdatagrams': 0,
'outmcastoctets': 912,
'outmcastpkts': 13,
'outnoroutes': 0,
'outoctets': 608,
'outpkts': 9,
'reasmfails': 0,
'reasmoks': 0,
'reasmreqds': 0,
'reasmtimeout': 0}],
['IFLA_INET6_ICMP6STATS',
{'csumerrors': 0,
'inerrors': 0,
'inmsgs': 0,
'num': 6,
'outerrors': 0,
'outmsgs': 9}],
['IFLA_INET6_TOKEN', '::']]}]]}]],
'change': 0,
'event': 'RTM_NEWLINK',
'family': 0,
'flags': 69699,
'header': {'error': None,
'flags': 2,
'length': 1140,
'pid': 19663,
'sequence_number': 1458118672,
'type': 16},
'ifi_type': 1,
'index': 4},

... This structure is repeated for each system network interface

]

So there you have it, a netlink packet decoded.