昨天遇到一個問題,原本在非 k8s 環境下可以運行的 eureka 集羣,上到 k8s 環境後,就無法運行。
這裏記錄一下解決問題的過程:
kubectl logs -f XXX -n XXX 看日誌後,報錯:
com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server
at com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:111) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator$6.execute(EurekaHttpClientDecorator.java:137) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.getApplications(EurekaHttpClientDecorator.java:134) ~[eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.getAndStoreFullRegistry(DiscoveryClient.java:1013) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.getAndUpdateDelta(DiscoveryClient.java:1055) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.fetchRegistry(DiscoveryClient.java:929) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient.refreshRegistry(DiscoveryClient.java:1451) [eureka-client-1.6.2.jar:1.6.2]
at com.netflix.discovery.DiscoveryClient$CacheRefreshThread.run(DiscoveryClient.java:1418) [eureka-client-1.6.2.jar:1.6.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]
我的eureka yaml 文件如下:
apiVersion: apps/v1
kind: Deployment
metadata:
name: eureka01-deployment
namespace: wx-prod
spec:
replicas: 1
template:
metadata:
labels:
app: eureka01-prod
regcenter: eureka
track: stable
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"
labelSelector:
matchExpressions:
- key: regcenter
operator: In
values:
- eureka
containers:
- name: eureka
image: harbor.prod.com/kube-prod/eureka:1.0
imagePullPolicy: Always
resources:
requests:
cpu: "500m"
memory: "1024Mi"
limits:
cpu: "1000m"
memory: "2048Mi"
ports:
- containerPort: 8761
env:
- name: eureka.server.enable-self-preservation
value: "false"
- name: eureka.client.service-url.defaultZone
value: http://wx:[email protected]:8761/eureka/,http://vass_wx:[email protected]:8761/eureka/
imagePullSecrets:
- name: harbor-secret-name
selector:
matchLabels:
app: eureka01-prod
---
---
apiVersion: v1
kind: Service
metadata:
name: eureka01-service
namespace: wx-prod
labels:
app: eureka01-svc
spec:
type: NodePort
selector:
app: eureka01-prod
ports:
- port: 8761
targetPort: 8761
nodePort: 30001
看字面意思就是 eureka 之間無法相互找到對方,因爲是我新搭建的 k8s 環境所以,同時eureka域名我使用的是 k8s 內部域名,我第一想到的是排查 coreDNS 是否正常工作
排查步驟如下:
參考這位同學的排查步驟: https://blog.csdn.net/alva_xu/article/details/85160552
1、 在 kubectl get pod -n kube-system 中查看 coreDNS 的pod是否正常;
2、 下載 busybox 並在k8s集羣內部啓動
3、 kubectl exec -ti busybox -- nslookup kubernetes.default 確認是否 域名解析是否有問題
結果證明完全正常。
環境沒問題,那麼出問題的地方就只可能是環境和代碼不匹配了,因此再排查註冊中心 bootstrap 文件:
server:
port: ${hostPort}
eureka:
client:
service-url:
defaultZone: http://wx:wx@${eureka.node01.name}:${eureka.node01.port}/eureka/,http://wx:wx@${eureka.node02.name}:${eureka.node02.port}/eureka/
fetch-registry: true
register-with-eureka: true
instance:
#要配置hosts
#hostname: ${eureka.hostname}
instance-id: ${spring.application.name}:${server.port}
prefer-ip-address: true
ip-address: ${ipAddress}
server:
peer-node-read-timeout-ms: 1000
####自我保護,線上設置爲true
enable-self-preservation: ${selfPreservation:true}
spring:
application:
name: eureka
security:
basic:
enabled: true
user:
name: wx
password: wx
突然反應過來,如果三個 eureka 都使用相同的 application.name:port 作爲註冊的 instanceid 那麼會不會是導致這個問題的原因呢? 接下來將該文件修改爲:
server:
port: 8761
eureka:
client:
service-url:
defaultZone: http://localhost:8761/eureka/
fetch-registry: true
register-with-eureka: true
instance:
instance-id: ${spring.cloud.client.ipAddress}:${server.port}
prefer-ip-address: true
server:
peer-node-read-timeout-ms: 1000
####自我保護,線上設置爲true
enable-self-preservation: false
spring:
application:
name: eureka
security:
basic:
enabled: true
user:
name: wx
password: wx
因爲 k8s 內部部署時,我使用的是 ClusterIP,每個 eureka 實例的 IP 應該都是不同的,這樣使用 IP + Port 作爲 instanceID :
啓動後發現完美解決, mark 一下。