記録。

めも。

descheduler再実践記録

前回の記事では、ほぼ和訳と実践してみたものの上手く動かないまま終わってしまった。

今回はちゃんと動いた結果を書いていく

環境

GKE

version : 1.13.11-gke.23

マシンタイプ : n1-standard-1

node数 : 3

リソースが欲しいので、stackdriver loggingを無効にしてfluentdがいない構成にしている

まずLowNodeUtilizationからやってみる

実行前Podの一覧である。

見てお分かりの通り、node3台にもかかわらず、2台のnodeで全てのPodが配置されている。

 kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-75b89bd54b-69bkl   1/1     Running   0          29m   10.48.1.16   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-hpdr7   1/1     Running   0          29m   10.48.1.19   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-khgvv   1/1     Running   0          29m   10.48.1.18   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-khvgc   1/1     Running   0          29m   10.48.1.17   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-vkn9h   1/1     Running   0          29m   10.48.0.15   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-z2xfj   1/1     Running   0          29m   10.48.0.16   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>

実行前にnodeの情報を見てみた。

LowNodeUtilizationはnodeのcpu、memory、podのリソースを設定して実行するものなので、事前に確認しておく。

本来の運用であれば、ある程度基準値を設けておいて設定しておくはずだが、今回はとりあえず動いているところを見たい+流れがわかりやすくなると思うので、事前に確認して設定することにする。

gke-wakashiyo-default-pool-8b0ec5ba-x29p

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-x29p
Non-terminated Pods:         (8 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                heapster-86f6474897-stbl9                              63m (6%)      63m (6%)    215840Ki (7%)    215840Ki (7%)  140m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-x29p    100m (10%)    0 (0%)      0 (0%)           0 (0%)         140m
  kube-system                metrics-server-v0.3.1-57c75779f-xg59s                  48m (5%)      143m (15%)  105Mi (3%)       355Mi (13%)    140m
  kube-system                prometheus-to-sd-wmcsp                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      140m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-69bkl                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     31m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-hpdr7                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     31m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-khgvv                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     31m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-khvgc                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     31m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests        Limits
  --------                   --------        ------
  cpu                        612m (65%)      609m (64%)
  memory                     651040Ki (24%)  1009440Ki (37%)
  ephemeral-storage          0 (0%)          0 (0%)
  attachable-volumes-gce-pd  0               0

gke-wakashiyo-default-pool-8b0ec5ba-2w40

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-2w40
Non-terminated Pods:         (6 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                kube-dns-79868f54c5-7cc5w                              260m (27%)    0 (0%)      110Mi (4%)       170Mi (6%)     141m
  kube-system                kube-dns-autoscaler-bb58c6784-8ms5s                    20m (2%)      0 (0%)      10Mi (0%)        0 (0%)         141m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-2w40    100m (10%)    0 (0%)      0 (0%)           0 (0%)         141m
  kube-system                prometheus-to-sd-qxg9g                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      141m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-vkn9h                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     32m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-z2xfj                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     32m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests     Limits
  --------                   --------     ------
  cpu                        581m (61%)   203m (21%)
  memory                     290Mi (10%)  390Mi (14%)
  ephemeral-storage          0 (0%)       0 (0%)
  attachable-volumes-gce-pd  0            0

gke-wakashiyo-default-pool-8b0ec5ba-654k

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-654k
Non-terminated Pods:         (4 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                kube-dns-79868f54c5-jvwg8                              260m (27%)    0 (0%)      110Mi (4%)       170Mi (6%)     142m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-654k    100m (10%)    0 (0%)      0 (0%)           0 (0%)         142m
  kube-system                l7-default-backend-fd59995cd-jfvwb                     10m (1%)      10m (1%)    20Mi (0%)        20Mi (0%)      142m
  kube-system                prometheus-to-sd-d6vmv                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      142m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests    Limits
  --------                   --------    ------
  cpu                        371m (39%)  13m (1%)
  memory                     150Mi (5%)  210Mi (7%)
  ephemeral-storage          0 (0%)      0 (0%)
  attachable-volumes-gce-pd  0           0

describe した結果を一覧表にまとめた。

node cpu memory pod数
gke-wakashiyo-default-pool-8b0ec5ba-x29p 612m (65%) 651040Ki (24%) 8
kube-system: 4
wakashiyo: 4
gke-wakashiyo-default-pool-8b0ec5ba-2w40 581m (61%) 290Mi (10%) 6
kube-system: 4
wakashiyo: 2
gke-wakashiyo-default-pool-8b0ec5ba-654k 371m (39%) 150Mi (5%) 4
kube-system: 4
wakashiyo: 0

今回はgke-wakashiyo-default-pool-8b0ec5ba-x29pにPodが偏っているので、このnodeを削除対象のnodeとしてスケジュールしたい。

そして、Podが全然配置されていないgke-wakashiyo-default-pool-8b0ec5ba-654kを再スケジュール対象のnodeとしてスケジュールしたい。

上記を考慮して以下のようなconfigMapの設定にした

apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemoveDuplicates":
        enabled: true
      "RemovePodsViolatingInterPodAntiAffinity":
        enabled: true
      "LowNodeUtilization":
        enabled: true
        params:
          nodeResourceUtilizationThresholds:
            thresholds:
              "cpu" : 50
              "memory": 50
              "pods": 5
            targetThresholds:
              "cpu" : 60
              "memory": 30
              "pods": 7

注意しなければならないのは、kube-systemも含めた全リソースのcpu使用率、memory使用率、Pod数を考慮して設定しなければならないことだ。

そして、再スケジュール対象のnodeを決めるpolicy(thresholds)は全ての条件を満たさなければならないのに対し、

削除対象のnodeを決めるpolicy(targetThresholds)は3つの要素のいずれかを満たせばいい。

ということを考慮して上記のconfigMapのように設定した。

このconfigMapを適用して、deschedulerのジョブを実行させた。以下が実行後のPodの一覧である。

kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-75b89bd54b-69bkl   1/1     Running   0          41m   10.48.1.16   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-dts4h   1/1     Running   0          30s   10.48.2.18   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-75b89bd54b-f72ch   1/1     Running   0          30s   10.48.2.19   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-75b89bd54b-jg9vv   1/1     Running   0          30s   10.48.1.20   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-vkn9h   1/1     Running   0          41m   10.48.0.15   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-xzffg   1/1     Running   0          30s   10.48.0.17   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>

うまくいったようだ。2台ずつに配置されたので、いい感じになった。

RemovePodsViolatingNodeAffinity をやってみる

policyの説明は省略する。

nodeのラベルの一覧を確認する。

kubectl get nodes -o json | jq ".items[] | .metadata.labels"
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/fluentd-ds-ready": "true",
  "beta.kubernetes.io/instance-type": "n1-standard-1",
  "beta.kubernetes.io/os": "linux",
  "cloud.google.com/gke-nodepool": "default-pool",
  "cloud.google.com/gke-os-distribution": "cos",
  "failure-domain.beta.kubernetes.io/region": "asia-northeast1",
  "failure-domain.beta.kubernetes.io/zone": "asia-northeast1-a",
  "kubernetes.io/hostname": "gke-wakashiyo-default-pool-8b0ec5ba-2w40"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/fluentd-ds-ready": "true",
  "beta.kubernetes.io/instance-type": "n1-standard-1",
  "beta.kubernetes.io/os": "linux",
  "cloud.google.com/gke-nodepool": "default-pool",
  "cloud.google.com/gke-os-distribution": "cos",
  "failure-domain.beta.kubernetes.io/region": "asia-northeast1",
  "failure-domain.beta.kubernetes.io/zone": "asia-northeast1-a",
  "kubernetes.io/hostname": "gke-wakashiyo-default-pool-8b0ec5ba-654k",
  "sample": "bias"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/fluentd-ds-ready": "true",
  "beta.kubernetes.io/instance-type": "n1-standard-1",
  "beta.kubernetes.io/os": "linux",
  "cloud.google.com/gke-nodepool": "default-pool",
  "cloud.google.com/gke-os-distribution": "cos",
  "failure-domain.beta.kubernetes.io/region": "asia-northeast1",
  "failure-domain.beta.kubernetes.io/zone": "asia-northeast1-a",
  "kubernetes.io/hostname": "gke-wakashiyo-default-pool-8b0ec5ba-x29p"
}

あらかじめ、gke-wakashiyo-default-pool-8b0ec5ba-654kに対して、 sample=bias というラベルを付与した。

そして、そのラベルが付いているnodeにPodをスケジューリングするようにした。

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  namespace: wakashiyo
  name: wakashiyo-deployment
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:1.12
          ports:
            - containerPort: 80
          resources:
            requests:
              memory: 75Mi
              cpu: 100m
            limits:
              memory: 100Mi
              cpu: 100m
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: sample
                    operator: In
                    values:
                      - bias
 kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-54664b6fc8-44qf2   1/1     Running   0          20s   10.48.2.20   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-5wlzm   1/1     Running   0          16s   10.48.2.24   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-nm5rh   1/1     Running   0          20s   10.48.2.22   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-qnc2l   1/1     Running   0          20s   10.48.2.21   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-wljgc   1/1     Running   0          16s   10.48.2.23   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
(base) takahiroyoshikawa@TakahironoMacBook-Pro descheduler %

全てのPodがgke-wakashiyo-default-pool-8b0ec5ba-654kに配置されている。

ここで、 sample=bias というラベルを他のnode に付与する。

(base) takahiroyoshikawa@TakahironoMacBook-Pro descheduler % kubectl label nodes gke-wakashiyo-default-pool-8b0ec5ba-654k sample-
node/gke-wakashiyo-default-pool-8b0ec5ba-654k labeled
(base) takahiroyoshikawa@TakahironoMacBook-Pro descheduler % kubectl label nodes gke-wakashiyo-default-pool-8b0ec5ba-x29p sample=bias
node/gke-wakashiyo-default-pool-8b0ec5ba-x29p labeled
kubectl get nodes -o json | jq ".items[] | .metadata.labels"
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/fluentd-ds-ready": "true",
  "beta.kubernetes.io/instance-type": "n1-standard-1",
  "beta.kubernetes.io/os": "linux",
  "cloud.google.com/gke-nodepool": "default-pool",
  "cloud.google.com/gke-os-distribution": "cos",
  "failure-domain.beta.kubernetes.io/region": "asia-northeast1",
  "failure-domain.beta.kubernetes.io/zone": "asia-northeast1-a",
  "kubernetes.io/hostname": "gke-wakashiyo-default-pool-8b0ec5ba-2w40"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/fluentd-ds-ready": "true",
  "beta.kubernetes.io/instance-type": "n1-standard-1",
  "beta.kubernetes.io/os": "linux",
  "cloud.google.com/gke-nodepool": "default-pool",
  "cloud.google.com/gke-os-distribution": "cos",
  "failure-domain.beta.kubernetes.io/region": "asia-northeast1",
  "failure-domain.beta.kubernetes.io/zone": "asia-northeast1-a",
  "kubernetes.io/hostname": "gke-wakashiyo-default-pool-8b0ec5ba-654k"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/fluentd-ds-ready": "true",
  "beta.kubernetes.io/instance-type": "n1-standard-1",
  "beta.kubernetes.io/os": "linux",
  "cloud.google.com/gke-nodepool": "default-pool",
  "cloud.google.com/gke-os-distribution": "cos",
  "failure-domain.beta.kubernetes.io/region": "asia-northeast1",
  "failure-domain.beta.kubernetes.io/zone": "asia-northeast1-a",
  "kubernetes.io/hostname": "gke-wakashiyo-default-pool-8b0ec5ba-x29p",
  "sample": "bias"
}

kube-schedulerはPodの作成時にはルールに基づいたスケジューリングを行うが、それ以外の場合は再スケジュールなどはしない。

そのためラベルを付け変えたが、特に変化はない

kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE     IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-54664b6fc8-44qf2   1/1     Running   0          3m40s   10.48.2.20   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-5wlzm   1/1     Running   0          3m36s   10.48.2.24   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-nm5rh   1/1     Running   0          3m40s   10.48.2.22   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-qnc2l   1/1     Running   0          3m40s   10.48.2.21   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-54664b6fc8-wljgc   1/1     Running   0          3m36s   10.48.2.23   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>

再スケジュールするためにconfigMapを適用し、jobを実行した。

configMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemoveDuplicates":
        enabled: true
      "RemovePodsViolatingInterPodAntiAffinity":
        enabled: true
      "RemovePodsViolatingNodeAffinity":
        enabled: true
        params:
          nodeAffinityType:
          - "requiredDuringSchedulingIgnoredDuringExecution"

結果

 kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-54664b6fc8-5hn99   1/1     Running   0          23s   10.48.1.22   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-54664b6fc8-6xp6k   1/1     Running   0          23s   10.48.1.25   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-54664b6fc8-7gvqt   1/1     Running   0          23s   10.48.1.26   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-54664b6fc8-jtqcd   1/1     Running   0          23s   10.48.1.23   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-54664b6fc8-x8tkg   1/1     Running   0          23s   10.48.1.24   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>

新たにラベルを付け替えたnodeにPodが再配置された。

RemoveDeplicates をやってみる

このポリシーを使用して、各nodeに1台ずつPodが配置されるようにしていく。

実行前のPodの一覧

 kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-75b89bd54b-77cvb   1/1     Running   0          32s   10.48.2.27   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-75b89bd54b-7sdxt   1/1     Running   0          33s   10.48.0.19   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-8dspm   1/1     Running   0          35s   10.48.0.18   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-qk5hp   1/1     Running   0          35s   10.48.2.26   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-75b89bd54b-rxqm8   1/1     Running   0          32s   10.48.0.20   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-vgw69   1/1     Running   0          35s   10.48.2.25   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>

gke-wakashiyo-default-pool-8b0ec5ba-x29pには1台も配置されていない。。。

以下のようにconfigMapを設定し、適用した

configMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "RemoveDuplicates":
        enabled: true

そして、jobを実行した結果、Podがどのnodeにも1台は配置されるようになった。

kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE     IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-75b89bd54b-6bs4z   1/1     Running   0          36s     10.48.1.31   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-77cvb   1/1     Running   0          3m15s   10.48.2.27   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-75b89bd54b-7bth7   1/1     Running   0          36s     10.48.1.30   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-7sdxt   1/1     Running   0          3m16s   10.48.0.19   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-ddt28   1/1     Running   0          36s     10.48.1.28   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-jnpgz   1/1     Running   0          36s     10.48.1.29   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>

ドキュメントを読んだ感じではDeploymentなどに紐づくPodが各nodeに1台だけであるみたいに読み取ったから、Replica数が多い場合はどうなるのかと思ったが、今回動かした感じでは最低1台は配置するみたいな解釈でも良さそうな気がしている。。。

追記

最初にLowNodeUtilizationを確認した際のconfigMapは、RemoveDeplicatesとLowNodeUtilizationがpolicyに含まれていたため、正直どっちが動いていい感じにスケジューリングされたのかわからなかったと後から気づいた。

そのため、先ほどのRemoveDeplicates実行後の環境を使用してLowNodeUtilizationが動いていることをもう一度確認してみたい。

再度実行前のnodeの状況を確認してみた。

gke-wakashiyo-default-pool-8b0ec5ba-654k

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-654k
Non-terminated Pods:         (5 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                kube-dns-79868f54c5-jvwg8                              260m (27%)    0 (0%)      110Mi (4%)       170Mi (6%)     169m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-654k    100m (10%)    0 (0%)      0 (0%)           0 (0%)         169m
  kube-system                l7-default-backend-fd59995cd-jfvwb                     10m (1%)      10m (1%)    20Mi (0%)        20Mi (0%)      169m
  kube-system                prometheus-to-sd-d6vmv                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      169m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-77cvb                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     5m52s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests    Limits
  --------                   --------    ------
  cpu                        471m (50%)  113m (12%)
  memory                     225Mi (8%)  310Mi (11%)
  ephemeral-storage          0 (0%)      0 (0%)
  attachable-volumes-gce-pd  0           0

gke-wakashiyo-default-pool-8b0ec5ba-2w40

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-2w40
Non-terminated Pods:         (5 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                kube-dns-79868f54c5-7cc5w                              260m (27%)    0 (0%)      110Mi (4%)       170Mi (6%)     170m
  kube-system                kube-dns-autoscaler-bb58c6784-8ms5s                    20m (2%)      0 (0%)      10Mi (0%)        0 (0%)         170m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-2w40    100m (10%)    0 (0%)      0 (0%)           0 (0%)         170m
  kube-system                prometheus-to-sd-qxg9g                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      170m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-7sdxt                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     7m11s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests    Limits
  --------                   --------    ------
  cpu                        481m (51%)  103m (10%)
  memory                     215Mi (8%)  290Mi (10%)
  ephemeral-storage          0 (0%)      0 (0%)
  attachable-volumes-gce-pd  0           0

gke-wakashiyo-default-pool-8b0ec5ba-x29p

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-x29p
Non-terminated Pods:         (8 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                heapster-86f6474897-stbl9                              63m (6%)      63m (6%)    215840Ki (7%)    215840Ki (7%)  171m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-x29p    100m (10%)    0 (0%)      0 (0%)           0 (0%)         171m
  kube-system                metrics-server-v0.3.1-57c75779f-xg59s                  48m (5%)      143m (15%)  105Mi (3%)       355Mi (13%)    171m
  kube-system                prometheus-to-sd-wmcsp                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      171m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-6bs4z                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     5m36s
  wakashiyo                  wakashiyo-deployment-75b89bd54b-7bth7                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     5m36s
  wakashiyo                  wakashiyo-deployment-75b89bd54b-ddt28                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     5m36s
  wakashiyo                  wakashiyo-deployment-75b89bd54b-jnpgz                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     5m36s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests        Limits
  --------                   --------        ------
  cpu                        612m (65%)      609m (64%)
  memory                     651040Ki (24%)  1009440Ki (37%)
  ephemeral-storage          0 (0%)          0 (0%)
  attachable-volumes-gce-pd  0               0
node cpu memory pod数
gke-wakashiyo-default-pool-8b0ec5ba-x29p 612m (65%) 651040Ki (24%) 8
kube-system: 4
wakashiyo: 4
gke-wakashiyo-default-pool-8b0ec5ba-2w40 481m (51%) 215Mi (8%) 5
kube-system: 4
wakashiyo: 1
gke-wakashiyo-default-pool-8b0ec5ba-654k 471m (50%) 225Mi (8%) 5
kube-system: 4
wakashiyo: 1

gke-wakashiyo-default-pool-8b0ec5ba-x29pに偏っているPodをgke-wakashiyo-default-pool-8b0ec5ba-2w40とgke-wakashiyo-default-pool-8b0ec5ba-654kにスケジューリングしたいので、以下のようなconfigMapを適用した

configMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "LowNodeUtilization":
        enabled: true
        params:
          nodeResourceUtilizationThresholds:
            thresholds:
              "cpu" : 60
              "memory": 20
              "pods": 6
            targetThresholds:
              "cpu" : 60
              "memory": 30
              "pods": 7

jobを実行した結果以下のようになった。

 kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-75b89bd54b-77cvb   1/1     Running   0          14m   10.48.2.27   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-75b89bd54b-7bth7   1/1     Running   0          11m   10.48.1.30   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-7sdxt   1/1     Running   0          14m   10.48.0.19   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-ddt28   1/1     Running   0          11m   10.48.1.28   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-jnpgz   1/1     Running   0          11m   10.48.1.29   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-zhxj9   1/1     Running   0          30s   10.48.2.28   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>

理想的には2台ずつ配置されて欲しかったが、うまくいかなかった。

なぜ上手くいかなかったのか、gke-wakashiyo-default-pool-8b0ec5ba-x29pの情報を確認してみた。

gke-wakashiyo-default-pool-8b0ec5ba-x29p

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-x29p
Non-terminated Pods:         (7 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                heapster-86f6474897-stbl9                              63m (6%)      63m (6%)    215840Ki (7%)    215840Ki (7%)  179m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-x29p    100m (10%)    0 (0%)      0 (0%)           0 (0%)         3h
  kube-system                metrics-server-v0.3.1-57c75779f-xg59s                  48m (5%)      143m (15%)  105Mi (3%)       355Mi (13%)    179m
  kube-system                prometheus-to-sd-wmcsp                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      3h
  wakashiyo                  wakashiyo-deployment-75b89bd54b-7bth7                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     14m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-ddt28                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     14m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-jnpgz                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     14m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests        Limits
  --------                   --------        ------
  cpu                        512m (54%)      509m (54%)
  memory                     574240Ki (21%)  907040Ki (33%)
  ephemeral-storage          0 (0%)          0 (0%)
  attachable-volumes-gce-pd  0               0

gke-wakashiyo-default-pool-8b0ec5ba-654k

ProviderID:                  gce://wakashiyo-playground/asia-northeast1-a/gke-wakashiyo-default-pool-8b0ec5ba-654k
Non-terminated Pods:         (6 in total)
  Namespace                  Name                                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                   ------------  ----------  ---------------  -------------  ---
  kube-system                kube-dns-79868f54c5-jvwg8                              260m (27%)    0 (0%)      110Mi (4%)       170Mi (6%)     3h2m
  kube-system                kube-proxy-gke-wakashiyo-default-pool-8b0ec5ba-654k    100m (10%)    0 (0%)      0 (0%)           0 (0%)         3h2m
  kube-system                l7-default-backend-fd59995cd-jfvwb                     10m (1%)      10m (1%)    20Mi (0%)        20Mi (0%)      3h2m
  kube-system                prometheus-to-sd-d6vmv                                 1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      3h2m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-77cvb                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     19m
  wakashiyo                  wakashiyo-deployment-75b89bd54b-zhxj9                  100m (10%)    100m (10%)  75Mi (2%)        100Mi (3%)     5m10s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests     Limits
  --------                   --------     ------
  cpu                        571m (60%)   213m (22%)
  memory                     300Mi (11%)  410Mi (15%)
  ephemeral-storage          0 (0%)       0 (0%)
  attachable-volumes-gce-pd  0            0

1台Podが削除された時点で、thresholdsとtargetThresholdsのそれぞれに対象となるnodeがなくなってしまったのかもしれない。。。

(gke-wakashiyo-default-pool-8b0ec5ba-2w40の情報を貼り忘れてしまったので、なんとも言えない感じになってしまいました。ご了承ください。。)

改めてconfigMapを設定した。

configMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: descheduler-policy-configmap
  namespace: kube-system
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
    strategies:
      "LowNodeUtilization":
        enabled: true
        params:
          nodeResourceUtilizationThresholds:
            thresholds:
              "cpu" : 60
              "memory": 20
              "pods": 5
            targetThresholds:
              "cpu" : 70
              "memory": 20
              "pods": 7

jobを回した結果、均等に配置することができた。

 kubectl get pods -n wakashiyo -o wide
NAME                                    READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
wakashiyo-deployment-75b89bd54b-5cph9   1/1     Running   0          29s   10.48.0.26   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-7sdxt   1/1     Running   0          27m   10.48.0.19   gke-wakashiyo-default-pool-8b0ec5ba-2w40   <none>           <none>
wakashiyo-deployment-75b89bd54b-gjmn9   1/1     Running   0          8m    10.48.2.29   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>
wakashiyo-deployment-75b89bd54b-jnpgz   1/1     Running   0          25m   10.48.1.29   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-sb5dn   1/1     Running   0          8m    10.48.1.32   gke-wakashiyo-default-pool-8b0ec5ba-x29p   <none>           <none>
wakashiyo-deployment-75b89bd54b-zhxj9   1/1     Running   0          13m   10.48.2.28   gke-wakashiyo-default-pool-8b0ec5ba-654k   <none>           <none>