개발자의 개발괴발

[T.S] kafka connect가 connect-offsets를 못가져올때 본문

개발/kafka

[T.S] kafka connect가 connect-offsets를 못가져올때

휘발성 기억력 2025. 3. 29. 22:09
반응형

kafka connect "Fail to list offsets"

계속 아래와 같은 에러가 발생하다가

org.apache.kafka.common.errors.TimeoutException: Timed out while waiting to get end offsets for topic 'connect-offsets' on brokers at bitnami-kafka-controller-headless:9092
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listOffsets(api=METADATA), deadlineMs=1743252113013, tries=477920, nextAllowedTryMs=1743252113116) timed out at 1743252113016 after 477920 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled listOffsets(api=METADATA) request with correlation id 5767168 due to node 0 being disconnected

결국엔 아래 에러를 띄우고 죽었다.

[2025-03-29 12:43:03,028] ERROR [Worker clientId=connect-1, groupId=lecture] Uncaught exception in herder work thread, exiting:  (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
org.apache.kafka.connect.errors.ConnectException: Fail to list offsets for topic partitions after 13 attempts.  Reason: Timed out while waiting to get end offsets for topic 'connect-offsets' on brokers at bitnami-kafka-controller-headless:9092
    at org.apache.kafka.connect.util.RetryUtil.retryUntilTimeout(RetryUtil.java:106)
    at org.apache.kafka.connect.util.RetryUtil.retryUntilTimeout(RetryUtil.java:56)
    at org.apache.kafka.connect.util.TopicAdmin.retryEndOffsets(TopicAdmin.java:778)
    at org.apache.kafka.connect.util.KafkaBasedLog.readEndOffsets(KafkaBasedLog.java:514)
    at org.apache.kafka.connect.util.KafkaBasedLog.readToLogEnd(KafkaBasedLog.java:466)
    at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:281)
    at org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:242)
    at org.apache.kafka.connect.runtime.Worker.start(Worker.java:232)
    at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:150)
    at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:365)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out while waiting to get end offsets for topic 'connect-offsets' on brokers at bitnami-kafka-controller-headless:9092
Caused by: org.apache.kafka.common.errors.TimeoutException: Call(callName=listOffsets(api=METADATA), deadlineMs=1743252183022, tries=472752, nextAllowedTryMs=1743252183125) timed out at 1743252183025 after 472752 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled listOffsets(api=METADATA) request with correlation id 6239923 due to node 0 being disconnected
[2025-03-29 12:43:03,040] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect)
[2025-03-29 12:43:03,040] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer)
[2025-03-29 12:43:03,055] INFO Stopped http_8083@641d4880{HTTP/1.1, (http/1.1)}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector)
[2025-03-29 12:43:03,056] INFO node0 Stopped scavenging (org.eclipse.jetty.server.session)
[2025-03-29 12:43:03,060] INFO REST server stopped (org.apache.kafka.connect.runtime.rest.RestServer)
[2025-03-29 12:43:03,060] INFO [Worker clientId=connect-1, groupId=lecture] Herder stopping (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2025-03-29 12:43:49,070] ERROR Executor java.util.concurrent.ThreadPoolExecutor@7a4a2a78[Shutting down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0] did not terminate in time (org.apache.kafka.common.utils.ThreadUtils)
[2025-03-29 12:43:49,073] INFO [Worker clientId=connect-1, groupId=lecture] Herder stopped (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2025-03-29 12:43:49,073] INFO Kafka Connect stopped (org.apache.kafka.connect.runtime.Connect)

 

broker에서 로그를 보면 아래와 같은 로그가 나오기 때문에 연결이 아예 안된건 아닌것 같다.

[2025-03-29 12:28:02,825] INFO [QuorumController id=0] CreateTopics result(s): CreatableTopic(name='connect-offsets', numPartitions=25, replicationFactor=1, assignments=], configs=[CreatableTopicConfig(name='cleanup.policy', value='compact')]): TOPIC_ALREADY_EXISTS (Topic 'connect-offsets' already exists.) (org.apache.kafka.controller.ReplicationControlManager)

 

broker에 접속해서 topic을 확인해보자.

# check if the topic exists
$ kafka-topics.sh --list --bootstrap-server=localhost:9092
__consumer_offsets
configTopic
connect-configs
connect-offsets
connect-status

$ topic=connect-offsets
$ kafka-get-offsets.sh --bootstrap-server localhost:9092 --topic $topic
#### No response.... ####

$ kafka-topics.sh --describe --bootstrap-server localhost:9092  --topic $topic
Topic: connect-offsets	TopicId: vMbllBRmTFS-EMtso6copQ	PartitionCount: 25	ReplicationFactor: 1	Configs: cleanup.policy=compact
	Topic: connect-offsets	Partition: 0	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 1	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 2	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 3	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 4	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 5	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 6	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 7	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 8	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 9	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 10	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 11	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 12	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 13	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 14	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 15	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 16	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 17	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 18	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 19	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 20	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 21	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 22	Leader: none	Replicas: 2	Isr: 2	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 23	Leader: none	Replicas: 1	Isr: 1	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 24	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:

topic이 존재는 하는데 offset을 조회하니 응답이 없었다.

topic을 describe해보니 leader가 없는 것들이 존재했다.

replicas가 3으로 되어있는것 같았다. 하지만 broker가 한개뿐이라 topic 생성이 제대로 안된것 같았다.

 

$ kubectl scale deployment kafka-connect --replicas=0
deployment.apps/kafka-connect scaled

kafka connect의 scale을 0으로 설정해서 죽이고

$ kafka-topics.sh --delete --bootstrap-server=localhost:9092 --topic connect-configs
$ kafka-topics.sh --delete --bootstrap-server=localhost:9092 --topic connect-offsets
$ kafka-topics.sh --delete --bootstrap-server=localhost:9092 --topic connect-status

kafka connect 관련 토픽들을 삭제하고

 

kafka connect의 설정들을 확인해보자.

- name: CONNECT_CONFIG_STORAGE_TOPIC
  value: "connect-configs"
- name: CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR
  value: "1"
- name: CONNECT_OFFSET_STORAGE_TOPIC
  value: "connect-offsets"
- name: CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR
  value: "1"
- name: CONNECT_STATUS_STORAGE_TOPIC
  value: "connect-status"
- name: CONNECT_STATUS_STORAGE_REPLICATION_FACTOR
  value: "1"

yaml 파일에 위의 값들 중에 replication factor 값들이 모두 1인 것을 확인하고

$ kubectl scale deployment kafka-connect --replicas=1
deployment.apps/kafka-connect scaled

다시 replicas를 1로 변경해 띄워보자.

 

토픽을 다시 확인하니 정상적으로 replication이 1로 세팅이 되어 생성된것을 확인했다.

$ topic=connect-offsets
$ kafka-topics.sh --describe --bootstrap-server localhost:9092  --topic $topic

$ kafka-topics.sh --describe --bootstrap-server localhost:9092  --topic $topic
Topic: connect-offsets	TopicId: fB5RHJB2RImnhDEVCkNM-A	PartitionCount: 25	ReplicationFactor: 1	Configs: cleanup.policy=compact
	Topic: connect-offsets	Partition: 0	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 1	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 2	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 3	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 4	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 5	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 6	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 7	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 8	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 9	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 10	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 11	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 12	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 13	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 14	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 15	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 16	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 17	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 18	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 19	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 20	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 21	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 22	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 23	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:
	Topic: connect-offsets	Partition: 24	Leader: 0	Replicas: 0	Isr: 0	Elr: 	LastKnownElr:

$ kafka-get-offsets.sh --bootstrap-server localhost:9092 --topic $topic
connect-offsets:0:0
connect-offsets:1:0
connect-offsets:10:0
connect-offsets:11:0
connect-offsets:12:0
connect-offsets:13:0
connect-offsets:14:0
connect-offsets:15:0
connect-offsets:16:0
connect-offsets:17:0
connect-offsets:18:0
connect-offsets:19:0
connect-offsets:2:0
connect-offsets:20:0
connect-offsets:21:0
connect-offsets:22:0
connect-offsets:23:0
connect-offsets:24:0
connect-offsets:3:0
connect-offsets:4:0
connect-offsets:5:0
connect-offsets:6:0
connect-offsets:7:0
connect-offsets:8:0
connect-offsets:9:0

 

아마도 처음에 kafka connect를 생성할때 kafka broker가 3개였었고

그에 맞춰서 replication factor도 3으로 설정을 한 상태였는데...

최근에 자원 문제로 kafka broker의 replica를 1개로 줄이고 하는 과정에서 topic들의 replication들은 여전히 3으로 남아있어서 그런것 같다.

반응형