-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Hi - not sure if this is an rdkafka issue or an actual kafka issue. (Not sure which one I want..)
I'm finding that when I send a lot of small messages, I frequently get message timeouts. I see this in both using the api as well as using the sample programs.
Example:
./rdkafka_performance -P -t TENSECONDLOGS -b masked.com:5757 -s 2 -a 2 -z gzip -p 1 -c 1000000 -X queue.buffering.max.messages=1000000
Message delivered: 471000 remain^M
Message delivered: 470000 remain^M
Message delivered: 469000 remain^M
Message delivered: 468000 remain^M
Message delivered: 467000 remain^M
Message delivered: 466000 remain^M
Message delivered: 465000 remain^M
Message delivered: 464000 remain^M
Message delivered: 463000 remain^M
Message delivered: 462000 remain^M
Message delivered: 461000 remain^M
Message delivered: 460000 remain^M
(sits and hangs here for a long time...and then:)
%4|1391743416.628|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743416.628|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743416.628|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743416.628|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743416.628|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743416.628|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743417.429|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743417.429|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743417.429|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
%4|1391743417.429|METADATA|rdkafka#producer-0| d146537-021.masked.com:5757/1052189105: Metadata request failed: Local: Message timed out^M
if I strace while its hanging, I see:
futex(0x1374160, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x137418c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7175, {1391707106, 253292000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x1374160, FUTEX_WAKE_PRIVATE, 1) = 0
I sometimes see this on the cluster (but definitely not always):
[2014-02-06 22:02:54,585] INFO Client session timed out, have not heard from server in 4815ms for sessionid 0x343c4f6383400a7, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2014-02-06 22:02:54,689] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
[2014-02-06 22:02:55,201] INFO Opening socket connection to server zookeeperlog1.masked.com/10.52.189.106:5700 (org.apache.zookeeper.ClientCnxn)
[2014-02-06 22:02:55,202] INFO Socket connection established to zookeeperlog1.masked.com/10.52.189.106:5700, initiating session (org.apache.zookeeper.ClientCnxn)
[2014-02-06 22:02:55,207] INFO Session establishment complete on server zookeeperlog1.masked.com/10.52.189.106:5700, sessionid = 0x343c4f6383400a7, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2014-02-06 22:02:55,208] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
do you think I should simply increase the zookeeper timeout? (note it's running on the same host as the cluster node). Have you seen this before?