Monitoring CoreDNS in EKS with Grafana Cloud

  • kkumtree

2024-10-30T23:44:01+09:00

kans
eks
grafana
otel
coredns

Grafana Cloud ์ฒซ ์‚ฌ์šฉ๊ธฐ

CloudNet@์—์„œ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๋Š” K8s Advanced Network Study(์ดํ•˜, KANS)๋ฅผ ํ†ตํ•ด ํ•™์Šตํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฒˆ ์ฃผ์ฐจ๋Š” ์‹ค๊ฐ์ด ์•„์ง ์•ˆ๋‚˜๋Š”๋ฐ, ์Šคํ„ฐ๋”” ๋งˆ์ง€๋ง‰ ์ฃผ์ฐจ์ž…๋‹ˆ๋‹ค.
๊ทธ๋ž˜์„œ ์—ฌ๋Ÿฌ๋ถ„์ด ์ž˜ ์•Œ๊ณ , ๋งค์šฐ ์ข‹์•„ํ•˜๋Š” EKS๋ฅผ ํ†ตํ•ด, CoreDNS ์ด์Šˆ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” Hands-on์„ ์ฐจ๊ทผ์ฐจ๊ทผ ๋”ฐ๋ผํ•ด๋ณด๋ ค๊ณ ํ•ฉ๋‹ˆ๋‹ค.

์œ„์˜ Blog๋ฅผ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ผํ•ด๋ณผ ๊ฒ๋‹ˆ๋‹ค.

0. EKS Cluster ์ƒ์„ฑ

์Šคํ„ฐ๋””์—์„œ ์ œ๊ณต๋œ CloudFormation์„ ํ†ตํ•ด EKS Cluster๋ฅผ ์ƒ์„ฑํ•ด๋ณผ๊นŒํ•ฉ๋‹ˆ๋‹ค.
eksctl์ด ์–ธ๊ธ‰๋˜์–ด ์žˆ์–ด์„œ ์™ ์ง€… ๋‚˜์ค‘์— ๋กค๋ฐฑํ•˜๊ณ  ํƒœ์ดˆ๋งˆ์„๋ถ€ํ„ฐ eksctl ๊ธฐ๋ฐ˜ CloudFormation ๋ฐฐํฌ๋ฅผ ํ•  ๊ฒƒ ๊ฐ™์€ ๋ถˆ์•ˆํ•จ์ด ์žˆ์ง€๋งŒ ํ•ด๋ณด์ฃ (?).

์Œ ์•„์ง์€ ๊ธฐ์šฐ์˜€๋„ค์š”. ๊ธฐ์–ต์„ ๋„์ง‘์–ด๋‚ด๋ณด๋‹ˆ bation host์—์„œ eksctl ์„ ์‚ฌ์šฉํ•ด์„œ EKS Cluster ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ๊นŒ์ง€ ์Šคํฌ๋ฆฝํŒ… ๋˜์–ด ์žˆ๋‹ค๊ณ , ๋ง์”€์„ ๋“ค์—ˆ๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

about-15-mins-needed

instances-from-cloudformation

์ƒ์„ฑ๋œ bastion์— ์ ‘์†ํ•ด์„œ, ํ™˜๊ฒฝ๋ณ€์ˆ˜ ๋“ฑ์„ ํ™•์ธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ssh -i ~/.ssh/id_ed25519 [email protected] # BASTION-HOST-IP  
# The authenticity of host '43.201.85.169 (43.201.85.169)' can't be established.
# ED25519 key fingerprint is SHA256:efFNF+24E7UUEzXzhqBDU0ss74yBmhGiaOI25XOVG9A.
# This key is not known by any other names.
# Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
# Warning: Permanently added '43.201.85.169' (ED25519) to the list of known hosts.
#    ,     #_
#    ~\_  ####_        Amazon Linux 2
#   ~~  \_#####\
#   ~~     \###|       AL2 End of Life is 2025-06-30.
#   ~~       \#/ ___
#    ~~       V~' '->
#     ~~~         /    A newer version of Amazon Linux is available!
#       ~~._.   _/
#          _/ _/       Amazon Linux 2023, GA and supported until 2028-03-15.
#        _/m/'           https://aws.amazon.com/linux/amazon-linux-2023/

# 10 package(s) needed for security, out of 13 available
# Run "sudo yum update" to apply all updates.
# (cm112@myeks:N/A) [root@myeks-bastion ~]# clear
tail -f /var/log/cloud-init-output.log \
#  66 โ€ php8.1                   available    [ =stable ]
#  67  awscli1                  available    [ =stable ]
#  68 โ€ php8.2                   available    [ =stable ]
#  69  dnsmasq                  available    [ =stable ]
#  70  unbound1.17              available    [ =stable ]
#  72  collectd-python3         available    [ =stable ]
# โ€  Note on end-of-support. Use 'info' subcommand.
# Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
# cloudinit End!
# Cloud-init v. 19.3-46.amzn2.0.2 finished at Sat, 02 Nov 2024 09:44:24 +0000. Datasource DataSourceEc2.  Up 91.81 seconds
^C
tail -f /root/create-eks.log
    "availabilityZones": [
        "ap-northeast-2c",
        "ap-northeast-2b",
        "ap-northeast-2a"
    ],
    "cloudWatch": {
        "clusterLogging": {}
    }
}

^C
kubectl ns default
# Context "[email protected]" modified.
# Active namespace is "default".
eksctl get cluster
# NAME	REGION		EKSCTL CREATED
# myeks	ap-northeast-2	True
# eksctl get nodegroup --cluster $CLUSTER_NAME
# CLUSTER	NODEGROUP	STATUS	CREATED			MIN SIZE	MAX SIZEDESIRED CAPACITY	INSTANCE TYPE	IMAGE ID	ASG NAME		TYPE
# myeks	ng1		ACTIVE	2024-11-02T09:55:58Z	3		3	3t3.medium	AL2_x86_64	eks-ng1-2cc97626-bf01-5bcc-d680-091e003bd586	managed
export | egrep 'ACCOUNT|AWS_|CLUSTER|KUBERNETES|VPC|Subnet' | egrep -v 'SECRET|KEY'
# declare -x ACCOUNT_ID="<ACCOUNT-ID>"
# declare -x AWS_DEFAULT_REGION="ap-northeast-2"
# declare -x AWS_PAGER=""
# declare -x AWS_REGION="ap-northeast-2"
# declare -x CLUSTER_NAME="myeks"
# declare -x KUBERNETES_VERSION="1.30"
# declare -x PrivateSubnet1="subnet-044cf8b34576820ea"
# declare -x PrivateSubnet2="subnet-0ac2f3cd52e1ae640"
# declare -x PrivateSubnet3="subnet-0e5b144c0039c348b"
# declare -x PubSubnet1="subnet-0fef215562a97f319"
# declare -x PubSubnet2="subnet-0ca12b8db356bd486"
# declare -x PubSubnet3="subnet-01628d89d7c34590b"
# declare -x VPCID="vpc-0bcfa9363c4ff0069"
kubectl get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone

# NAME                                               STATUS   ROLES    AGE   VERSION               INSTANCE-TYPE   CAPACITYTYPE   ZONE
# ip-192-168-1-219.ap-northeast-2.compute.internal   Ready    <none>   12m   v1.30.4-eks-a737599   t3.medium       ON_DEMAND      ap-northeast-2a
# ip-192-168-2-198.ap-northeast-2.compute.internal   Ready    <none>   12m   v1.30.4-eks-a737599   t3.medium       ON_DEMAND      ap-northeast-2b
# ip-192-168-3-85.ap-northeast-2.compute.internal    Ready    <none>   12m   v1.30.4-eks-a737599   t3.medium       ON_DEMAND      ap-northeast-2c

eksctl get iamidentitymapping --cluster myeks
# ARN											USERNAME				GROUPS					ACCOUNT
# arn:aws:iam::<ACCOUNT-ID>:role/eksctl-myeks-nodegroup-ng1-NodeInstanceRole-bU6W7Cr0ugY5	system:node:{{EC2PrivateDNSName}}	system:bootstrappers,system:nodes	
eksctl get iamidentitymapping --cluster myeks
# ARN											USERNAME				GROUPS					ACCOUNT
# arn:aws:iam::<ACCOUNT-ID>:role/eksctl-myeks-nodegroup-ng1-NodeInstanceRole-bU6W7Cr0ugY5	system:node:{{EC2PrivateDNSName}}	system:bootstrappers,system:nodes

1. Hands-on์„ ์œ„ํ•œ ํ™˜๊ฒฝ ๊ตฌ์„ฑ

์ด์ œ๋Š” Hands-on์—์„œ Pre-requisite๋กœ ์š”๊ตฌํ•˜๋Š” ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€๋กœ ๊ตฌ์„ฑํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

export EKS_CLUSTER_NAME=$(echo $CLUSTER_NAME)  
export SERVICE=prometheusservice  
export ACK_SYSTEM_NAMESPACE=ack-system
# export RELEASE_NAME=`curl -sL https://api.github.com/repos/aws-controllers-k8s/$SERVICE-controller/releases/latest | grep '"tag_name":' | cut -d'"' -f4`

์—ฌ๊ธฐ์„œ ์˜ค๋ž˜๋œ ํฌ์ŠคํŒ…์˜ ์ด์Šˆ๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ฒŒ ๋˜๋Š”๋ฐ,
GitHub REST API๊ฐ€ ๋” ์•ˆ์ „ํ•ด์กŒ๊ธฐ ๋•Œ๋ฌธ์—, REALASE_NAME ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅ์— ๊ฐ€๊นŒ์›Œ์กŒ์Šต๋‹ˆ๋‹ค!

curl -sL https://api.github.com/repos/aws-controllers-k8s/$SERVICE-controller/releases/latest 
# {
#   "message": "Not Found",
#   "documentation_url": "https://docs.github.com/rest/releases/releases#get-the-latest-release",
#   "status": "404"
# }

๋ฐฐํŠธ๋งจ! ์ดˆ๋Šฅ๋ ฅ๋„ ์—†๋Š” ์šฐ๋ฆฐ ๋ญ˜ ํ•  ์ˆ˜ ์žˆ์ฃ ?
rest-api-404

๋ณดํ†ต์ด๋ฉด ๋ ˆํฌ๋ฅผ ๋‹น๊ฒจ์™€์„œ, git tag๋ฅผ ํ†ตํ•ด ํ™•์ธํ•˜๋Š”๊ฒŒ ๋งž๋Š”๋ฐ, ํ•ธ์ฆˆ์˜จ์ด๋‹ˆ ๋งํฌ๋ฅผ ์—ด์–ด์„œ ์ตœ์‹  ํƒœ๊ทธ๋ฅผ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

์ €๋„ ๋ฒˆ๊ฑฐ๋กœ์›Œ์„œ 302 Found ์ฒ˜๋ฆฌํ•˜์—ฌ ํƒœ๊ทธ ๋ฐ›์•„์™”์Šต๋‹ˆ๋‹ค.

curl -sS -I -G  https://github.com/aws-controllers-k8s/$SERVICE-controller/releases/latest | grep -i location | awk -F'/' '{print $NF}'
# v1.2.15
export RELEASE_NAME=$(curl -sS -I -G  https://github.com/aws-controllers-k8s/$SERVICE-controller/releases/latest | grep -i location | awk -F'/' '{print $NF}')
echo $RELEASE_NAME
# v1.2.15

2. Hands-On ๋ฌด์ž‘์ • ๋”ฐ๋ผํ•˜๊ธฐ

(a) Amazon Managed Prometheus Workspace ์ƒ์„ฑ

aws amp create-workspace --alias blog-workspace --region $AWS_REGION
# {
#     "arn": "arn:aws:aps:ap-northeast-2:<ACCOUNT-ID>:workspace/ws-0d032a51-2b98-43b1-90cb-f5069329f1af",
#     "status": {
#         "statusCode": "CREATING"
#     },
#     "tags": {},
#     "workspaceId": "ws-0d032a51-2b98-43b1-90cb-f5069329f1af"
# }

create-workspace

(b) Prometheus ethtool exporter ๋ฐฐํฌ

  • ์•ˆ๋‚ด๋œ๋Œ€๋กœ exporter๋ฅผ ๋ฐฐํฌ๋ฌธ์„ ์ž‘์„ฑํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
cat << EOF > ethtool-exporter.yaml
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ethtool-exporter
  labels:
    app: ethtool-exporter
spec:
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 100%
  selector:
    matchLabels:
      app: ethtool-exporter
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9417'      
      labels:
        app: ethtool-exporter
    spec:
      hostNetwork: true
      terminationGracePeriodSeconds: 0
      containers:
      - name: ethtool-exporter
        env:
        - name: IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP      
        image: drdivano/ethtool-exporter@sha256:39e0916b16de07f62c2becb917c94cbb3a6e124a577e1325505e4d0cdd550d7b
        command:
          - "sh"
          - "-exc"
          - "python3 /ethtool-exporter.py -l \$(IP):9417 -I '(eth|em|eno|ens|enp)[0-9s]+'"
        ports:
        - containerPort: 9417
          hostPort: 9417
          name: http
          
        resources:
          limits:
            cpu: 250m
            memory: 100Mi
          requests:
            cpu: 10m
            memory: 50Mi

      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: ethtool-exporter
  name: ethtool-exporter
spec:
  clusterIP: None
  ports:
    - name: http
      port: 9417
  selector:
    app: ethtool-exporter
EOF
kubectl apply -f ethtool-exporter.yaml

๋‹จ์ˆœ exporter๋‹ˆ๊นŒ ๋ฐฐํฌ๋Š” ์ž˜ ๋œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

kubectl get pods,svc -owide
# NAME                         READY   STATUS    RESTARTS   AGE   IP              NODE                                               NOMINATED NODE   READINESS GATES
# pod/ethtool-exporter-b62vt   1/1     Running   0          51s   192.168.2.198   ip-192-168-2-198.ap-northeast-2.compute.internal   <none>           <none>
# pod/ethtool-exporter-jbdlx   1/1     Running   0          51s   192.168.1.219   ip-192-168-1-219.ap-northeast-2.compute.internal   <none>           <none>
# pod/ethtool-exporter-pj2r7   1/1     Running   0          51s   192.168.3.85    ip-192-168-3-85.ap-northeast-2.compute.internal    <none>           <none>

# NAME                       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE   SELECTOR
# service/ethtool-exporter   ClusterIP   None         <none>        9417/TCP   51s   app=ethtool-exporter
# service/kubernetes         ClusterIP   10.100.0.1   <none>        443/TCP    16h   <none>

(c) ADOT(AWS Distro for OpenTelemetry) Collector ์š”๊ตฌ์‚ฌํ•ญ ์ฒดํฌ

Pre-requisite ๋ฅผ ์œ ์˜ํ•ด์•ผํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

kubectl version | grep "Server Version"
# Server Version: v1.30.6-eks-7f9249a
  • ADOT add-on ํ˜ธํ™˜ ๋ฒ„์ „ ํ™•์ธ : v0.62.1 ์ดํ•˜ ๋ฒ„์ „์ด ์•„๋‹ˆ๋ฉด ๋ณ„๋„ ์ž‘์—… ํ•„์š”์—†์Œ.
aws eks describe-addon-versions --addon-name adot --kubernetes-version 1.30 --query 'addons[0].addonVersions[*].addonVersion' 
# [
#     "v0.102.1-eksbuild.2",
#     "v0.102.1-eksbuild.1",
#     "v0.102.0-eksbuild.1"
# ]

(d) ADOT Collector๋ฅผ ์œ„ํ•œ cert-manager ์„ค์น˜

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml  
kubectl get pod -n cert-manager  
# NAME                                       READY   STATUS    RESTARTS   AGE
# cert-manager-cainjector-5dbdc949c4-r2wpn   1/1     Running   0          29s
# cert-manager-d68cffc95-wsx5c               1/1     Running   0          29s
# cert-manager-webhook-759ddb6555-fzl24      1/1     Running   0          29s

(e) ADOT Collector๋ฅผ ์œ„ํ•œ IRSA ์ƒ์„ฑ

ํ•ด๋‹น Policy ARN์ด ์‹ค์ œ ์กด์žฌํ•˜๋Š”์ง€ ์ •๋„๋Š” ์ฒดํฌํ•˜๊ณ  ์ƒ์„ฑํ•˜๋ฉด ์ •์‹ ๊ฑด๊ฐ•์— ์ข‹์Šต๋‹ˆ๋‹ค.

echo :$AWS_REGION:$EKS_CLUSTER_NAME:
# :ap-northeast-2:myeks:
eksctl create iamserviceaccount \
--name adot-collector \
--namespace default \
--region $AWS_REGION \
--cluster $EKS_CLUSTER_NAME \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
--approve \
--override-existing-serviceaccounts
# 2024-11-03 11:27:22 [โ„น]  1 iamserviceaccount (default/adot-collector) was included (based on the include/exclude rules)
# 2024-11-03 11:27:22 [!]  metadata of serviceaccounts that exist in Kubernetes will be updated, as --override-existing-serviceaccounts was set
# 2024-11-03 11:27:22 [โ„น]  1 task: { 
#     2 sequential sub-tasks: { 
#         create IAM role for serviceaccount "default/adot-collector",
#         create serviceaccount "default/adot-collector",
#     } }2024-11-03 11:27:22 [โ„น]  building iamserviceaccount stack "eksctl-myeks-addon-iamserviceaccount-default-adot-collector"
# 2024-11-03 11:27:22 [โ„น]  deploying stack "eksctl-myeks-addon-iamserviceaccount-default-adot-collector"
# 2024-11-03 11:27:22 [โ„น]  waiting for CloudFormation stack "eksctl-myeks-addon-iamserviceaccount-default-adot-collector"
# 2024-11-03 11:27:52 [โ„น]  waiting for CloudFormation stack "eksctl-myeks-addon-iamserviceaccount-default-adot-collector"
# 2024-11-03 11:27:52 [โ„น]  created serviceaccount "default/adot-collector"

(f) ADOT add-on ์„ค์น˜

์ด๋ฏธ ๋ฒ„์ „ ์ฒดํฌ๋ฅผ ํ•ด๋ณด์•˜์ง€๋งŒ, ๋‹ค์‹œ ํ•ด๋ด…์‹œ๋‹ค.

aws eks describe-addon-versions --addon-name adot --kubernetes-version 1.30 \
--query "addons[].addonVersions[].[addonVersion, compatibilities[].defaultVersion]" --output text  
# v0.102.1-eksbuild.2
# True
# v0.102.1-eksbuild.1
# False
# v0.102.0-eksbuild.1
# False
aws eks create-addon --addon-name adot --addon-version v0.102.1-eksbuild.2 --cluster-name $EKS_CLUSTER_NAME
# {
#     "addon": {
#         "addonName": "adot",
#         "clusterName": "myeks",
#         "status": "CREATING",
#         "addonVersion": "v0.102.1-eksbuild.2",
#         "health": {
#             "issues": []
#         },
#         "addonArn": "arn:aws:eks:ap-northeast-2:<ACCOUNT-ID>:addon/myeks/adot/eec977ee-84a1-85fe-ecbe-a2f51c90e9e7",
#         "createdAt": "2024-11-03T11:31:33.678000+09:00",
#         "modifiedAt": "2024-11-03T11:31:33.694000+09:00",
#         "tags": {}
#     }
# }

์ œ๋Œ€๋กœ ๋ฐฐํฌ๋˜์—ˆ๋‚˜ ์ฒดํฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

kubectl get po -n opentelemetry-operator-system
# NAME                                     READY   STATUS    RESTARTS   AGE
# opentelemetry-operator-b7dbbdf7c-tqvfl   2/2     Running   0          64s

(g) ADOT Collector ๊ตฌ์„ฑ

์•„๋ž˜์™€ ๊ฐ™์ด collector-config-amp.yaml์„ ์ž‘์„ฑํ•˜๊ณ , ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.

  • ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์ž˜ ์ฒดํฌํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
    • AMP_REMOTE_WRITE_ENDPOINT : ๋จผ์ € ์ƒ์„ฑํ–ˆ๋˜ ๊ทธ๊ฑฐ ๋งž์Šต๋‹ˆ๋‹ค.
    • AWS_REGION
    • EKS_CLUSTER_NAME
# export AMP_REMOTE_WRITE_ENDPOINT=<AMP_REMOTE_WRITE_ENDPOINT>
export AMP_REMOTE_WRITE_ENDPOINT=https://aps-workspaces.ap-northeast-2.amazonaws.com/workspaces/ws-0d032a51-2b98-43b1-90cb-f5069329f1af/api/v1/remote_write
echo $AMP_REMOTE_WRITE_ENDPOINT
# https://aps-workspaces.ap-northeast-2.amazonaws.com/workspaces/ws-0d032a51-2b98-43b1-90cb-f5069329f1af/api/v1/remote_write
cat > collector-config-amp.yaml <<EOF
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: my-collector-amp
spec:
  mode: deployment
  serviceAccount: adot-collector
  podAnnotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8888'
  resources:
    requests:
      cpu: "1"
    limits:
      cpu: "1"
  config: |
    extensions:
      sigv4auth:
        region: $AWS_REGION
        service: "aps"

    receivers:
      #
      # Scrape configuration for the Prometheus Receiver
      # This is the same configuration used when Prometheus is installed using the community Helm chart
      # 
      prometheus:
        config:
          global:
            scrape_interval: 60s
            scrape_timeout: 30s
            external_labels:
              cluster: $EKS_CLUSTER_NAME

          scrape_configs:
          - job_name: kubernetes-pods
            scrape_interval: 15s
            scrape_timeout: 5s
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (https?)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scheme
              target_label: __scheme__
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: \$\$1:\$\$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
              replacement: __param_\$\$1
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: kubernetes_namespace
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: kubernetes_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed|Completed
              source_labels:
              - __meta_kubernetes_pod_phase
                                
    processors:
      batch/metrics:
        timeout: 60s         

    exporters:
      prometheusremotewrite:
        endpoint: $AMP_REMOTE_WRITE_ENDPOINT
        auth:
          authenticator: sigv4auth

    service:
      extensions: [sigv4auth]
      pipelines:   
        metrics:
          receivers: [prometheus]
          processors: [batch/metrics]
          exporters: [prometheusremotewrite]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-prometheus-role
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-prometheus-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-prometheus-role
subjects:
  - kind: ServiceAccount
    name: adot-collector
    namespace: default

EOF
cat collector-config-amp.yaml | grep remote_write
        # endpoint: https://aps-workspaces.ap-northeast-2.amazonaws.com/workspaces/ws-0d032a51-2b98-43b1-90cb-f5069329f1af/api/v1/remote_write
  • ๋ฐฐํฌ…!
kubectl apply -f collector-config-amp.yaml

3. Grafana Cloud ๊ตฌ์„ฑ

AMG ๋Œ€์‹  Grafana Cloud๋ฅผ ์‚ฌ์šฉํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ๋ฐฉ๋ฒ•์€ ๋งค์šฐ ๊ฐ„๋‹จ!

(a) Grafana Cloud ๊ฐ€์ž…

๋„ค ์ด๊ฑฐ ๊ฐ€์ž…ํ•ด๋ณธ ์ ์ด ์—†์–ด์„œ ์ ์–ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.

(b) ํ”Œ๋Ÿฌ๊ทธ์ธ ํ™œ์„ฑํ™” with ๋”ธ๊น

just-click-to-use-amp-as-datasource

(c) Prometheus Datasource ์„ค์ •

  • ๊ฒฝ๋กœ: Home > Connections > Data sources > grafana-amazonprometheus-datasource

  • URL ์˜ˆ์‹œ: https://aps-workspaces.ap-northeast-2.amazonaws.com/workspaces/ws-0d032a51-2b98-43b1-90cb-f5069329f1af

datasource-url

  • Auth: AWS์˜ SigV4๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
    • Access Key ID / Secret Access Key ์ž…๋ ฅ.
    • Assume Role์˜ ๊ฒฝ์šฐ, Preview๋ฅผ ์œ„ํ•œ ํ‹ฐ์ผ“์„ ํ•˜๊ธฐ์—” ์‹œ๊ฐ„์ด ์—†์–ด Skip.

datasource-key

  • Additional: ๋ฆฌ์ „๊ณผ TLS ์„ค์ •

datasource-additional

4. Query ์ž‘์„ฑ ํ›„ ํ™•์ธ

query-in-grafana-cloud

๋ธ”๋กœ๊ทธ์— ๋‚˜์˜จ ๋Œ€๋กœ, ์ •์ƒ ์ถœ๋ ฅ๋˜๋Š” ๊ฒƒ์€ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

9. Vaporware

์ค‘๊ฐ„์— ํŒจ๋‹‰์ด ๊ฑธ๋ ค์„œ, Grafana Cloud์™€ ์—ฐ๊ฒฐ ๋ชฉ์ ์œผ๋กœ Grafana Cloud Agent๋ฅผ ๋ฐฐํฌํ•ด์•ผํ•˜๋‚˜ ํ–ˆ๋Š”๋ฐ, ๊ธฐ์šฐ์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ํ•„์š”๋Š” ์—†์–ด๋ณด์ž…๋‹ˆ๋‹ค.

(a) ๊ถŒํ•œ ๋ถ€์—ฌ ์ž‘์—…

curl https://gist.githubusercontent.com/rfratto/b6c5888e89faed3b04fa2533e0bec1a2/raw/bb9aa5e560009e98b48861d0b2ce54fc8a4303e6/script.bash -o agent-permissions-aks.bash
sed -i "s/YOUR_EKS_CLUSTER_NAME/${EKS_CLUSTER_NAME}/g" agent-permissions-aks.bash
cat agent-permissions-aks.bash | head -n 4
# ##!/bin/bash
# CLUSTER_NAME=myeks # SEE THIS LINE IF CHANGED TO YOUR CLUSTER NAME  
# AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
# OIDC_PROVIDER=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")

์‹คํ–‰ํ•ฉ์‹œ๋‹ค.

ls -al agent-permissions-aks.bash 
# -rw-r--r-- 1 root root 4341 Nov  3 12:09 agent-permissions-aks.bash
chmod u+x agent-permissions-aks.bash
ls -al agent-permissions-aks.bash
# -rwxr--r-- 1 root root 4341 Nov  3 12:09 agent-permissions-aks.bash
./agent-permissions-aks.bash

์ค‘๊ฐ„ ์ค‘๊ฐ„, error์— ์„ฌ์ฐŸํ–ˆ์ง€๋งŒ, ์ƒ์„ฑ์€ ๋œ๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# ./agent-permissions-aks.bash
Creating a new trust policy

An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name EKS-GrafanaAgent-AMP-ServiceAccount-Role cannot be found.
Appending to the existing trust policy

An error occurred (NoSuchEntity) when calling the GetPolicy operation: Policy arn:aws:iam::<ACCOUNT-ID>:policy/AWSManagedPrometheusWriteAccessPolicy was not found.
Creating a new permission policy AWSManagedPrometheusWriteAccessPolicy
{
    "Policy": {
        "PolicyName": "AWSManagedPrometheusWriteAccessPolicy",
        "PolicyId": "ANPASTWNT54JUITZSLWOX",
        "Arn": "arn:aws:iam::<ACCOUNT-ID>:policy/AWSManagedPrometheusWriteAccessPolicy",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "PermissionsBoundaryUsageCount": 0,
        "IsAttachable": true,
        "CreateDate": "2024-11-03T03:16:10+00:00",
        "UpdateDate": "2024-11-03T03:16:10+00:00"
    }
}

An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name EKS-GrafanaAgent-AMP-ServiceAccount-Role cannot be found.
EKS-GrafanaAgent-AMP-ServiceAccount-Role role does not exist. Creating a new role with a trust and permission policy
arn:aws:iam::<ACCOUNT-ID>:role/EKS-GrafanaAgent-AMP-ServiceAccount-Role
2024-11-03 12:16:16 [โ„น]  IAM Open ID Connect provider is already associated with cluster "myeks" in "ap-northeast-2"

(b) Grafana Cloud Agent ๋ฐฐํฌ

๋ธ”๋กœ๊ทธ์ฒ˜๋Ÿผ ํ•ด๋ณด๋ ค๊ณ  ํ–ˆ๋Š”๋ฐ, ํ•ด๋‹น install-sigv4.sh ํŒŒ์ผ์ด v0.18.4 ๊นŒ์ง€๋งŒ ์ง€์›ํ•˜๋Š” ๊ฒƒ์ด์–ด์„œ ํ•ด๋ณด๊ณ  ์•ˆ๋˜๋ฉด ์ข…๋ฃŒํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

kubectl create namespace grafana-agent; \
WORKSPACE="ws-0d032a51-2b98-43b1-90cb-f5069329f1af" \
ROLE_ARN="arn:aws:iam::<ACCOUNT-ID>:role/EKS-GrafanaAgent-AMP-ServiceAccount-Role" \
REGION="ap-northeast-2" \
NAMESPACE="grafana-agent" \
REMOTE_WRITE_URL="https://aps-workspaces.$REGION.amazonaws.com/workspaces/$WORKSPACE/api/v1/remote_write" \
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/grafana/agent/v0.18.4/production/kubernetes/install-sigv4.sh)" | kubectl apply -f -
# namespace/grafana-agent created
# serviceaccount/grafana-agent created
# configmap/grafana-agent created
# configmap/grafana-agent-deployment created
# daemonset.apps/grafana-agent created
# deployment.apps/grafana-agent-deployment created
# resource mapping not found for name: "grafana-agent" namespace: "" from "STDIN": no matches for kind "ClusterRole" in version "rbac.authorization.k8s.io/v1beta1"
# ensure CRDs are installed first
# resource mapping not found for name: "grafana-agent" namespace: "" from "STDIN": no matches for kind "ClusterRoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
# ensure CRDs are installed first

๊ทธ๋ž˜์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜๋™์œผ๋กœ ์žฌ๊ตฌ์„ฑํ•ด์„œ ๋ฐฐํฌ ๋‹ค์‹œ ํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ์ˆ˜์ •๋œ ๋ถ€๋ถ„
    • ๊ธฐ์กด: v1beta1
    • ์ˆ˜์ •: v1
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: grafana-agent
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: grafana-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: grafana-agent
subjects:
- kind: ServiceAccount
  name: grafana-agent
  namespace: grafana-agent

kkumtree

plumber for infra

kkumtree

Source code on GitHub

ยฉ 2025 kkumtree and contributors All rights reserved.
Licensed under
CC BY-NC-ND 4.0