CloudNet@์์ ์งํํ๊ณ ์๋ K8s Advanced Network Study(์ดํ, KANS)๋ฅผ ํตํด ํ์ตํ ๋ด์ฉ์ ์ ๋ฆฌํฉ๋๋ค.
์คํฐ๋ ์งํ ์, Manifests๋ฅผ ์ฌ์ฉํ์ฌ Calico๋ฅผ ์ค์นํ์์ผ๋,
Operator๋ฅผ ์ฌ์ฉํ์ฌ ์ค์นํ๋ ๋ฐฉ๋ฒ์ ์ ๋ฆฌํฉ๋๋ค.
๊ณผ์ ๋ ์๋์์ง๋ง, ์์ ๋ค๋ค Operator Framework๋ฅผ ์ฌ์ฉํด์ ๋ง๋ผ์น๊ถ ๋ณถ๋ฏ์ด
Operator๋ฅผ ์ง์ง๊ณ ๋ณถ๋ ๊ฒ ๊ฐ์์ ํธ๊ธฐ์ฌ์ ์ ๋ฆฌํด๋ณด์์ต๋๋ค.
์ฐธ๊ณ ๋ก Manifests๋ฅผ ์ฌ์ฉํ์ฌ ์ค์น ์, 50๊ฐ์ ๋ ธ๋[1]๋ฅผ ์ด๊ณผํ๋ ๊ฒฝ์ฐ Typha๋ฅผ ๊ตฌ์ฑํ์ฌ์ผ ํฉ๋๋ค.
Calico ์ค์น ํ๊ฒฝ : AWS EC2(No EKS), kubeadm[2], pod-network-cidr=172.16.0.0/16, IPIP Mode
1. Calico Routing Mode
์์ ์ธ๊ธ๋ IPIP Mode๋ฅผ ์ดํดํ๋ ค๋ฉด Calico์ Routing Mode๋ฅผ ํ์ ํ์์ฑ์ด ์์์ต๋๋ค.
ํ๋๊ฐ ํต์ ์ ๋
ธ๋ ๊ฐ์ encapsulation์ ์ ๋ต์ ๊ธฐ์ค์ผ๋ก ๋๋์ด ๋ณผ ์ ์๊ฒ ์ต๋๋ค.
- IPIP Mode: (tunl interface)
IP header๋ก ๊ฐ์ธ(encapsulate)์ ๋ค์ Outer header๋ฅผ ์ ๊ฑฐํ๋ ๋ฐฉ์. - VXLAN Mode: (vxlan interface)
UDP header๋ก ๊ฐ์ธ์ ๋ค์ Outer header๋ฅผ ์ ๊ฑฐํ๋ ๋ฐฉ์. - Direct Mode: ์๋ณธ ํจํท ๊ทธ๋๋ก. CSP์ ๊ฒฝ์ฐ NIC์์ Src/Dest Check ๊ธฐ๋ฅ Disable ํ์.
๊ทธ ์ธ์๋ (Network Level)Pod traffic Encryption[3] ์ด ์์ต๋๋ค.
Azure์์๋ VNet์์ IPIP๊ฐ ์ฐจ๋จ๋ฉ๋๋ค. ์ฌ์ค IPIP Mode๋ก ๊ตฌ์ฑํ ๊ฒฝ์ฐ, CSP๋ ๋ฒจ์ด ์๋ Kubeadm ๋ฑ์์ ์ง์ ํ pod network cidr๊ฐ์ ์ฌ์ฉ์ ์ ์ ๊ฐ์ ๊ณ ๋ คํด์ผํ์ฌ ๊ด๋ฆฌ์ ์ธก๋ฉด์์ ์ด์๊ฐ ๋๊ธฐ์, VXLAN Mode๋ฅผ ์ฌ์ฉํ๋ ๊ฒ์ด ์ฌ๋ฌ๋ชจ๋ก ์ข์๋ณด์ ๋๋ค. ๋ฌผ๋ก ์ด๊ฑฐ๋ Azure ์ธ ๋ ํด๋ด์ผ ๊ฒ ์ง์.
2. Calico Operator ์ค์น ๋ฐ ์ค์
Docs: Install Calico/Operator
๊ทธ๋ฅ ์ฑ์ฑ ์ฝ์ผ๋ฉด, Operator๋ฅผ ์ํ CRD ์ค์น ๋ฐ Custom ์ค์ ๋ง ์ ์ฉํ๋ฉด ๋ฉ๋๋ค.
๊ทธ๊ฒ ๋์ด๊ณ ๊ทธ๊ฒ ๋ฌธ์ ์
๋๋ค(?).
(1) CRD ์ค์น
์๋ ํ์ผ์ ๋ฐ์์ ์ ์ฉํ๋ ๊ฑธ ์ข์ํ๋๋ฐ…
์ง์ ํด๋ณด๋, ์ด๊ฑด ์์ ํ create๋ฅผ ์ถ์ฒ๋๋ฆฝ๋๋ค. ์ด์ง ๋นํฉ์ค๋ฌ์ ์ต๋๋ค.
# SET CALICO_VERSION_NAME
# ref. https://github.com/projectcalico/calico/tags
CALICO_VERSION_TAG=v3.28.2 && echo $CALICO_VERSION_TAG
# v3.28.2
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/${CALICO_VERSION_TAG}/manifests/tigera-operator.yaml
tigera-operator Namespace ๋ฐ CRD, SA, Deployment๊ฐ ์์ฑ๋ฉ๋๋ค.
ํ์ง๋ง, CoreDNS์ ์ํ๋ ๋น์ฐํ ์์ง Pending์
๋๋ค.
(โ|HomeLab:default) root@k8s-m:~# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-55cb58b774-62vtz 0/1 Pending 0 21m
coredns-55cb58b774-l8znv 0/1 Pending 0 21m
(2) Custom ์ค์ ์ ์ฉ
์์ ์ ์์ด
yq๋ฅผ ์ฌ์ฉํ์์ต๋๋ค. mikefarah/yq@v4
์๋์ ๊ฐ์ด custom-resource.yaml ํ์ผ์ ๋ฐ์, Calico ๊ตฌ์ฑ[4]์ ํฉ๋๋ค.
curl https://raw.githubusercontent.com/projectcalico/calico/${CALICO_VERSION_TAG}/manifests/custom-resources.yaml -sSo custom-resources-$(date --iso-8601).yaml
ls | grep custom-resources
# custom-resources-2024-09-22.yaml
์ฃผ๋ก ์์ ๋๋ ๋ถ๋ถ์ calicoNetwork.ippools์ blockSize์ cidr, encapsulation์
๋๋ค.
# mikefarah/yq pre-installed (>=v4)
yq '(select(.kind == "Installation") | .spec.calicoNetwork.ipPools[0] | (.blockSize, .cidr, .encapsulation))' custom-resources-2024-09-23.yaml
26
192.168.0.0/16
VXLANCrossSubnet
์ค์ ์ ์ฉ ์ ๊ฒฝํํ๋ ํธ๋ฌ๋ธ์ kubeadm init ์ ์ค์ ํ pod-network-cidr๋ฅผ ์์๋ด๋ ๋ฐฉ๋ฒ์ด์๋๋ฐ,
์๋์ ๊ฐ์ด ConfigMap์ ์กฐํํ์ฌ ์์๋ผ ์ ์์์ต๋๋ค.
kubectl get configmap -n kube-system kubeadm-config -o yaml | grep podSubnet
# podSubnet: 172.16.0.0/16
- blockSize: IP Pool์ ํฌ๊ธฐ, 26์ 64๊ฐ์ IP์ด๋ฏ๋ก 24(256๊ฐ)๋ก ๋ณ๊ฒฝํด๋ณด๊ฒ ์ต๋๋ค.
- cidr: kubeadmin init ์ ์ค์ ํ pod-network-cidr (์ด๋ฒ์ ๊ฒฝ์ฐ, 172.16.0.0/16)
- encapsulation[5]: ์๋ ์ค ํ๋๋ฅผ ๊ณ ๋ฅผ ์ ์์ต๋๋ค.
- IPIP, VXLAN, IPIPCrossSubnet, VXLANCrossSubnet, None(Optional)
yq 'with(select(.kind == "Installation").spec.calicoNetwork.ipPools[0] ; .blockSize = 24 | .cidr = "172.16.0.0/16" | .encapsulation = "IPIP")' custom-resources-2024-09-23.yaml -i
์ด์ ๋น๋ก์ ์ ์ฉ์ ํ ์ ์์ต๋๋ค.
kubectl create -f custom-resources-2024-09-23.yaml
# installation.operator.tigera.io/default created
# apiserver.operator.tigera.io/default created
ํธ-์. ์ค๋ ์ ์ ์ ์๊ฒ ๊ตฐ์.

kubectl get pod -A --sort-by=.metadata.creationTimestamp
# NAMESPACE NAME READY STATUS RESTARTS AGE
# kube-system kube-scheduler-k8s-m 1/1 Running 0 30h
# kube-system kube-controller-manager-k8s-m 1/1 Running 0 30h
# kube-system kube-apiserver-k8s-m 1/1 Running 0 30h
# kube-system etcd-k8s-m 1/1 Running 0 30h
# kube-system coredns-55cb58b774-62vtz 1/1 Running 0 30h
# kube-system kube-proxy-zj6tv 1/1 Running 0 30h
# kube-system coredns-55cb58b774-l8znv 1/1 Running 0 30h
# kube-system kube-proxy-ct678 1/1 Running 0 30h
# kube-system kube-proxy-qbp9m 1/1 Running 0 30h
# kube-system kube-proxy-gqzw9 1/1 Running 0 30h
# tigera-operator tigera-operator-576646c5b6-z6kkb 1/1 Running 0 30h
# calico-system calico-node-rdvhh 1/1 Running 0 103s
# calico-system csi-node-driver-hjms8 2/2 Running 0 103s
# calico-system csi-node-driver-hf2md 2/2 Running 0 103s
# calico-system csi-node-driver-cvrsj 2/2 Running 0 103s
# calico-system csi-node-driver-8bm8w 2/2 Running 0 103s
# calico-system calico-typha-64b97658dd-2nfhq 1/1 Running 0 103s
# calico-system calico-node-q5x2w 1/1 Running 0 103s
# calico-system calico-node-hx2xv 1/1 Running 0 103s
# calico-system calico-node-grtwc 1/1 Running 0 103s
# calico-system calico-kube-controllers-66fd48f858-xbrhp 1/1 Running 0 103s
# calico-system calico-typha-64b97658dd-g7c29 1/1 Running 0 94s
# calico-apiserver calico-apiserver-69f798bcb-g6gmq 1/1 Running 0 38s
# calico-apiserver calico-apiserver-69f798bcb-tst89 1/1 Running 0 38s
kubeadm์์ ์ค์ ํ pod-network-cidr๊ณผ ์ผ์นํ์ง ์์ผ๋ฉด, ์๋์ ๊ฐ์ ์๋ฌ๊ฐ ๋ฐ์ํฉ๋๋ค.

(์ฐธ๊ณ ) Manifests ๊ธฐ๋ณธ ์ค์ ๊ฐ ๋๋ฌ๋ณด๊ธฐ
์ ์ Manifests ์ค์น ๋ฐฉ์์ ์ดํด๋ณด๊ฒ ์ต๋๋ค.
v3.28.2 ๋ฒ์ ๊ธฐ์ค, L4924-4935๋ฅผ ์ดํด๋ณด๋ฉด,
IPIP Mode๊ฐ ๊ธฐ๋ณธ ํ์ฑํ ๋์ด์์์ ์ ์ ์์ต๋๋ค.
curl https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/calico.yaml -sSq | sed -n '4924,4935p'
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
value: "Always"
# Enable or Disable VXLAN on the default IP pool.
- name: CALICO_IPV4POOL_VXLAN
value: "Never"
# Enable or Disable VXLAN on the default IPv6 IP pool.
- name: CALICO_IPV6POOL_VXLAN
value: "Never"
3. Retina ์ค์น ์๋ (์คํจ)
์๋๋ ์๋ ์ค์ ์๋ฌ๋ก๊ทธ[6] ๋ณด๊ณ ๋ฏธ์ง์ ์ธํฐํ์ด์ค๋ก ์ธํด ์ค๋จ๋ ๋ด์ฉ์
๋๋ค.
IPIP๋ชจ๋๋ tunl ์ธํฐํ์ด์ค๋ฅผ ์ฌ์ฉํ๋๋ฐ, ์ด๋ ๋ฏธ์ง์ ์ฌํญ์์ ์ ์ ์์์ต๋๋ค.
# kubectl logs -n kube-system retina-agent-866h7
ts=2024-09-23T15:53:59.761Z level=error caller=linuxutil/ethtool_stats_linux.go:78 msg="Error while getting ethtool:" ifacename=tunl0 error="interface not supported while retrieving stats: operation not supported" errorVerbose="operation not supported\ninterface not supported while retrieving stats\ngithub.com/microsoft/retina/pkg/plugin/linuxutil.(*CachedEthtool).Stats\n\t/go/src/github.com/microsoft/retina/pkg/plugin/linuxutil/ethtool_handle_linux.go:45\ngithub.com/microsoft/retina/pkg/plugin/linuxutil.(*EthtoolReader).readInterfaceStats\n\t/go/src/github.com/microsoft/retina/pkg/plugin/linuxutil/ethtool_stats_linux.go:73\ngithub.com/microsoft/retina/pkg/plugin/linuxutil.(*EthtoolReader).readAndUpdate\n\t/go/src/github.com/microsoft/retina/pkg/plugin/linuxutil/ethtool_stats_linux.go:43\ngithub.com/microsoft/retina/pkg/plugin/linuxutil.(*linuxUtil).run.func2\n\t/go/src/github.com/microsoft/retina/pkg/plugin/linuxutil/linuxutil_linux.go:109\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"
Network Monitoring Tool์ธ Retina๋ฅผ ์ค์นํด๋ด ๋๋ค.
- Helm์ด ์์ด์ผํฉ๋๋ค. ๊ณต์ Docs๊ฐ ์ ์ผ ์ ํํฉ๋๋ค.
(1) Helm chart ์ค์น
Basic Mode ๋ก ์งํํด๋ณด๊ฒ ์ต๋๋ค.
# Set the version to a specific version here or get latest version from GitHub API.
VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
--version $VERSION \
--set image.tag=$VERSION \
--set operator.tag=$VERSION \
--set logLevel=info \
--set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\]"
๋ค์๊ณผ ๊ฐ์ ์ถ๋ ฅ๊ฐ์ด ๋์ต๋๋ค.
Release "retina" does not exist. Installing it now.
Pulled: ghcr.io/microsoft/retina/charts/retina:v0.0.16
Digest: sha256:384e4b45d37ab49b6e2e742012e3d49230ce2be102895dccb504b42540091419
NAME: retina
LAST DEPLOYED: Sun Sep 15 19:29:03 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
1. Installing retina service using helm: helm install retina ./deploy/legacy/manifests/controller/helm/retina/ --namespace kube-system --dependency-update
2. Cleaning up/uninstalling/deleting retina and dependencies related:
(2) Prometheus ์ค์น
์์ ์ถ๋ ฅ๊ฐ์ NOTES.1์ ๊ทธ๋๋ก ์น๋ฉด ์๋ฌ๊ฐ ์ ์์ ์ผ๋ก ๋์ผํฉ๋๋ค. ํด๋น ๋๋ ํ์ผ์ ๋ฐ์ง ์์๊ธฐ ๋๋ฌธ์ ๋๋ค.
-
์๋ฌ ๋ก๊ทธ๋ฅผ ๋ณด๋ฉด, ์ด ๋ํ Document๋ฅผ ์๋ดํ๋ ๊ฒ์ ์ ์ ์์ต๋๋ค. https://github.com/microsoft/retina/blob/3d2c7a55f8c0388df271453f5fc7b166c2f275be/deploy/legacy/prometheus/values.yaml
-
Prometheus ์ปค๋ฎค๋ํฐ ์ฐจํธ๋ฅผ ์ฌ์ฉํฉ๋๋ค. Legacy ๋ชจ๋๋ก ์งํํ๋, Github๋ฅผ ์ดํด๋ณด๋ Hubble์ ์ฐ๋ ๋ฐฉ์๋ ์๋ ๊ฒ ๊ฐ์ต๋๋ค.
-
์์ ์ธ๊ธ๋ ํ์ผ์ ๊ฒฝ๋ก: https://github.com/microsoft/retina/blob/3d2c7a55f8c0388df271453f5fc7b166c2f275be/deploy/legacy/prometheus/values.yaml
mkdir -p deploy/legacy/prometheus
touch deploy/legacy/prometheus/values.yaml
# cat <<EOF> deploy/legacy/prometheus/values.yaml
# > COPY AND PASTE (๋์ถฉ ์์ values.yaml ๋ด์ฉ)
# EOF
# ADD Prometheus Community Chart Repository
helm install prometheus -n kube-system -f deploy/legacy/prometheus/values.yaml prometheus-community/kube-prometheus-stack
helm repo update
helm install prometheus -n kube-system -f deploy/legacy/prometheus/values.yaml prometheus-community/kube-prometheus-stack
# NAME: prometheus
# LAST DEPLOYED: Sun Sep 15 19:59:33 2024
# NAMESPACE: kube-system
# STATUS: deployed
# REVISION: 1
# NOTES:
# kube-prometheus-stack has been installed. Check its status by running:
# kubectl --namespace kube-system get pods -l "release=prometheus"
#
# Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
kubectl --namespace kube-system get pods -l "release=prometheus"
# NAME READY STATUS RESTARTS AGE
# prometheus-kube-prometheus-operator-64c9474db-sr5bp 1/1 Running 0 67s
# prometheus-kube-state-metrics-688d66b5b8-xn7kp 1/1 Running 0 67s
# prometheus-prometheus-node-exporter-5lvgp 1/1 Running 0 66s
# prometheus-prometheus-node-exporter-98drk 1/1 Running 0 67s
# prometheus-prometheus-node-exporter-dfss9 1/1 Running 0 67s
# prometheus-prometheus-node-exporter-zr44x 1/1 Running 0 67s
NodePort๋ฅผ ์์ฑํด์ ์๋ํด๋ณด๊ธฐ ์ํด values.yaml์ ์๋ ๋ ๊ฐ์ ์ถ๊ฐํ์ฌ[6] ์
๋ฐ์ดํธ ํ์ต๋๋ค.
- prometheus.service.type: NodePort
- grafana.service.type: NodePort
helm upgrade prometheus -n kube-system -f deploy/legacy/prometheus/values.yaml prometheus-community/kube-prometheus-stack
kubectl get secret -n kube-system prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
# prom-operator
kubectl get svc -n kube-system | grep NodePort
<!-- prometheus-grafana NodePort 10.200.1.23 <none> 80:32496/TCP 116m
prometheus-kube-prometheus-prometheus NodePort 10.200.1.36 <none> 9090:30090/TCP,8080:31038/TCP 116m
๊ทธ๋ผํ๋ ์ ์์ ์ํด, PUBLIC_IP:32496์ผ๋ก ์ ์ํ์ฌ, ์น ๋์๋ณด๋๋ ํ์ธํ์์ผ๋
๋ชจ๋ ๋ฉํธ๋ฆญ์ด ํ์ธ๋์ง ์๋ ์ง๊ธฐํ.. ๊ฒฝํ์ ํ์ต๋๋ค.
p8s๋ ์นUI ์ ์์ ์ปค๋
cURL๋ ์ ๋จนํ๋ ๊ฑธ๋ณด๋ ์ด๊ฒ ๋คํธ์ํฌ ์ธํฐํ์ด์ค ๋ฌธ์ ์ธ์ง ์๋๋ฉด ์ ์ ์๋ฌ์ธ์ง ํผ๋์ด ๋์ง๋ง,
tunl ์ธํฐํ์ด์ค๊ฐ ์๋๋ค๋ ์ ์ ๊ธฐ๋กํ๊ณ ์ ์ ์ด๋ณด์์ต๋๋ค.

4. Calico Operator ์ค์น๋ฅผ ์ํด ์๋ํด๋ดค๋ ๊ฒ๋ค
# Error Log ํ์ธ์ ์ํ ํ์์ฌํญ
kubectl logs deployment/tigera-operator -n tigera-operator
# Error Log๋ฅผ ํตํด, ์๊ฒ๋ ํด๊ฒฐ๋ฒ์ด ์๋ ์ฌํญ
kubectl get configmap -n kube-system kubeadm-config -o yaml
# kubectl create ํ, delete ํ๋ค๊ณ ํด๊ฒฐ์ด ๋๋ ๊ฒ ์๋์์. CRD์์ ์ถฉ๋
# --server-side --force-conflicts ์ต์
์ ์ฌ์ฉํด์ ๊ฐ์ ๋ฎ์ด์ฐ๊ธฐ๋ฅผ ํด์ผ๋จ
kubectl apply --server-side --force-conflicts -f tigera-operator.yaml
kubectl apply --server-side --force-conflicts -f custom-resources.yaml
Reference
[1] https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico-with-kubernetes-api-datastore-more-than-50-nodes
[2] https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#options
[3] https://docs.tigera.io/calico/latest/network-policy/encrypt-cluster-pod-traffic
[4] https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.IPPool
[5] https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.EncapsulationType
[6] https://medium.com/@muppedaanvesh/a-hands-on-guide-to-kubernetes-monitoring-using-prometheus-grafana-%EF%B8%8F-b0e00b1ae039
[๊ทธ์ธ]
- https://stackoverflow.com/questions/69190171/calico-kube-controllers-and-calico-node-are-not-ready-crashloopbackoff
- https://serverfault.com/questions/1138767/calico-node-and-kube-proxy-crashed-permanently-on-a-new-node
- https://mrmaheshrajput.medium.com/deploy-kubernetes-cluster-on-aws-ec2-instances-f3eeca9e95f1
- https://tech.osci.kr/%EC%BF%A0%EB%B2%84%EB%84%A4%ED%8B%B0%EC%8A%A4-%EB%84%A4%ED%8A%B8%EC%9B%8C%ED%81%AC-calico/
- https://github.com/projectcalico/calico/issues/7538
- https://github.com/projectcalico/calico/issues/6407
- https://github.com/projectcalico/calico/issues/3878
- https://github.com/projectcalico/calico/issues/7890
- https://github.com/projectcalico/calico/issues/4218
- https://github.com/projectcalico/calico/issues/7826
- https://dev.to/prakashvra/how-to-setup-a-kubernetes-cluster-on-aws-ec2-using-kubeadm-containerd-and-calico-560o
kkumtree
Source code on GitHub
ยฉ 2025 kkumtree and contributors All rights reserved.
Licensed under
CC BY-NC-ND 4.0