polypod-solver/paper.txt at main · curnext/polypod-solver · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
                                                                                                                                        PDF Download
                                                                                                                                        3429885.3429963.pdf
                                                                                                                                        19 December 2025
                                                                                                                                        Total Citations: 3
                                                                                                                                        Total Downloads: 139
    .
    .


        Latest updates: hps://dl.acm.org/doi/10.1145/3429885.3429963


                                                                                                                                        .
                                                                                                                                        .
                                                                                                                              Published: 07 December 2020
        .
        .


                                                                                                                              .
    SHORT-PAPER


                                                                                                                              .
                                                                                                                              Citation in BibTeX format
    Flexible Migration in Blue-Green Deployments within a Fixed Cost


                                                                                                                              .
                                                                                                                              .
                                                                                                                              Middleware '20: 21st International
                                                                                                                              Middleware Conference
    EDDY TRUYEN, KU Leuven, Leuven, Vlaams-Brabant, Belgium                                                                   December 7 - 11, 2020
    .


                                                                                                                              Delft, Netherlands
    BERT B LAGAISSE, KU Leuven, Leuven, Vlaams-Brabant, Belgium


                                                                                                                              .
                                                                                                                              .
                                                                                                                              Conference Sponsors:
    .


    WOUTER JOOSEN, KU Leuven, Leuven, Vlaams-Brabant, Belgium                                                                 ACM
    .


    ARNOUT HOEBRECKX, KU Leuven, Leuven, Vlaams-Brabant, Belgium
    .


    CÉDRIC DE DYCKER, KU Leuven, Leuven, Vlaams-Brabant, Belgium
    .
    .
    .


    Open Access Support provided by:
    .


    KU Leuven
    .


                                          WOC'20: Proceedings of the 2020 6th International Workshop on Container Technologies and Container Clouds (December 2020)
                                                                                                                              hps://doi.org/10.1145/3429885.3429963
                                                                                                                                                 ISBN: 9781450382090
.
Flexible Migration in Blue-Green Deployments within a Fixed Cost
   Eddy Truyen, Bert Lagaisse,                                              Arnout Hoebreckx                                 Cédric De Dycker
         Wouter Joosen                                                  Dept. of Computer Science                         Dept. of Computer Science
                  imec-DistriNet                                                KU Leuven                                        KU Leuven
                   KU Leuven                                                     Belgium                                           Belgium
                     Belgium                                     arnout.hoebreckx@gmail.com                           cedric.dedycker@icloud.com
   eddy.truyen@cs.kuleuven.be


ABSTRACT                                                                                     1 Introduction
This paper presents the concept of PolyPod that consists of                                  Vertical scaling of Pods without restarting the Pods will
multiple Pods that run different versions of the same                                        likely be a new feature of Kubernetes [1]. The design of this
container image on the same node in order to share common                                    feature is complex because it breaks the architectural
libraries in memory. Its novelty is that it proposes a blue-                                 invariant that the resources allocated to a Pod should be
print for blue-green deployments in order to balance                                         immutable. Therefore, applications must explicitly opt-in for
maximum flexibility in the number of migration steps with                                    no-restart so they can adapt their configuration when
maximum workload consolidation within a fixed total                                          resource allocation changes. Moreover, pre-emptive
resource cost. This balance between flexibility and improved                                 scheduling is not supported in this new feature: when scaling
resource utilization is important for various application areas                              up a Pod on a node will cause the sum of requests to exceed
where users are served by the same application instance and                                  the capacity of that node, no eviction of a lower-priority Pod
have different time preferences for being upgraded to a new                                  will be attempted, but the scaling action will be aborted [1].
application version. The PolyPod concept is also relevant for                                This imposes research towards application-specific
a planned feature of Kubernetes so that Pods can be                                          architectures and abstractions where by design vertical
vertically scaled without re-starting them, but where scaling                                scaling will never exceed node resources without
actions are aborted if the capacity of the node is to be                                     unnecessary under-provisioning of nodes and opting out for
exceeded. We explain how the PolyPod concept supports                                        no-restarts is taken into account as well.
balancing flexible migration and resource utilization, with                                  This paper explores such possible abstraction in the context
and without Pod restarts, by simulating various migration                                    of continuous deployment. In particular blue-green
scenarios based on a quantitative cost model.                                                deployments run multiple versions of the same application
CCS CONCEPTS                                                                                 and allow to migrate workload to newer versions at different
Computer systems organization ~ Architectures~ Distributed                                   times in different migration sizes. Vertical scaling of Pods
architectures~ Cloud computing                                                               (scaling up/down) without restarts rather than horizontal
KEYWORDS                                                                                     scaling (scaling out/ in) is better suited for facilitating the
Continuous deployment, Resource management, Container                                        above migration scenarios because it allows to maintain t+1
orchestration                                                                                fault-tolerance requirements for all versions [2]. Moreover it
                                                                                             is more cost-efficient for scaling down and roll backs.
ACM Reference format:
                                                                                             However, to support zero-down upgrades, there is a surge
Eddy Truyen, Arnout Hoebreckx, Cédric De Dycker, Bert Lagaisse and
Wouter Joosen. 2020. In Proceedings of Containers Workshop on
                                                                                             cost when the new version has been scaled up and the old
Container Technologies and Container Clouds (WOC’20). ACM, New                               version has not yet been scaled down. When applications
York, NY, USA, 6 pages. https://doi.org/10.1145/3429885.3429963                              opt-out for no-restart, this surge cost is even much higher.
Permission to make digital or hard copies of part or all of this work for personal or        We define the PolyPod concept, discuss its implementation
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full
                                                                                             feasibility and explore the trade-off between maximum
citation on the first page. Copyrights for third-party components of this work must          flexibility and workload consolidation within a fixed
be honored. For all other uses, contact the owner/author(s).
WOC’20, December, 2020, Delft, The Netherlands                                               resource cost. Understanding this trade-off allows for
© 2018 Copyright held by the owner/author(s). 978-1-4503-8209-0/20/12...$15.00               flexible planning of the number of migration steps and their
                                                                                             respective workload migration sizes without reaching a step


                                                                                        13
 WOC’20, December, 2020, Delft, The Netherlands                                                                       E. Truyen et al.

where workload never can be migrated due to a too high                 nodes. However, it pays off to co-locate Pods of different
surge cost on an already consolidated node.                            versions on the same node to save memory.
The remainder of this paper is structured as follows. Section          After all, an advantage of containers is the possibility to
2 motivates the work and provides the necessary                        share dynamically linked libraries in memory. In the article
background. Section 3 presents the PolyPod concept and                 by Ferreira et al. [6], an experiment was conducted proving
discusses implementation feasibility. Section 4 analyzes the           that containers with corresponding images share memory
abovementioned trade-off by simulating all possible                    between shared libraries. The research was conducted on an
migration scenarios based on a quantitative cost model of the          older container technology LXC. However, it is also an
surge cost. Section 5 presents related work and Section 6              interesting find for blue-green deployments in Kubernetes
concludes.                                                             because this motivates placing Pods with different container
                                                                       image        versions       on     the      same      node.
                                                                       The original experiment was performed on LXC containers
2 Motivation and Background                                            and an AUFS file layer. In this paper, we reproduced this
In the paradigm of agile software development, continuous              experiment on Kubernetes where Docker is used as a
deployment is often used as a strategy for making                      container technology and overlay2 as a more modern
applications available to end users. This strategy differs from        implementation of AUFS [16].
traditional deployment strategies in terms of the greater
frequency with which applications are upgraded to a newer
version [3]. It is important that the service reduction is
minimal during upgrades in order not to violate performance
and availability SLAs.

2.1 Blue-green deployments
There are different techniques to perform a continuous
deployment, but in this paper two techniques are important:
rolling upgrades and blue-green deployments [4]. In a
rolling upgrade, an application is upgraded to a new version
by means of gradual replacement of application instances.
In blue-green deployments, older and newer versions of the
application instances co-exist in parallel for a determined
time, and all versions can receive user requests in parallel.
Over time, users are migrated from older to newer versions                       Figure 1: Experiment by Ferreira et al. [6]
in a gradual fashion in multiple upgrade phases. This is               The original experiment has been positioned as follows.
especially relevant for multi-tenant SaaS applications where           There are three containers C1, C2 and C3 as shown in Figure
multiple tenants are served by the same application instance.          1. For C1 and C2 their file layers are an AUFS union mount
Service degradation or outages that may appear as part of a            point, consisting of several layers including a shared library.
failed upgrade or an upgrade of a stateful application triggers        C3 consists of the same components but these are placed on
the issue that not all tenants prefer an upgrade to happen             an ext4 filesystem, which is the default filesystem for
during the same time slot [5].                                         multiple Linux distributions. The results showed that
In the container-based cluster orchestrator Kubernetes, Pods           libraries were shared between C1 and C2 but not with C3.
are the unit of deployment and therefore correspond to a                  Table 1: Memory usage of Pods on one node (in kB)
separate application instance. Different Pod versions                                                      1 Pod          2 Pods
typically differ in their container image and resource                  Process                            Shared Private Shared Private
allocation. Blue-Green deployments can be set up by                     /a.out                                 0      12      4       8
creating different sets of replicated Pods, one for each                /lib/x86_64-linux-gnu/libm-2.19.so     0 524 356 136
application version.                                                    /lib/x86_64-linux-gnu/libc-2.19.so 956 136 1036              88
                                                                        /gnu/gsl/lib/libgslcblas.so.0.0.0      0      72     64       8
2.2 Memory sharing                                                      /gnu/gsl/lib/libgsl.so.19.3.0          0 684 524 136
To meet fault-tolerance and availability requirements,                  /lib/x86_64-linux-gnu/ld-2.19.so     140       8 140          8
replicated Pods of one version must be deployed on different            Total                               1096 1436 2124 384


                                                                  14
    Migration in Blue-Green Deployments within a Fixed Cost                   WOC’20, December, 2020, Delft, The Netherlands

The reproduction of this experiment uses an image                      container restarts. Technically speaking, Docker and other
containing the same layers as in the original experiment               container runtimes already allow vertical scaling of
being the GNU Scientific Library (GSL) and an Ubuntu                   containers without restarts [8]. However the current design
base. The memory measurements are done via / proc / <pid>              of Kubernetes requires that Pods always must be rescheduled
/ smaps, this shows the private and shared memory of the               when adjusting their resource allocation. This is because the
process. In the first scenario, one Pod is started up and this         request and limit fields that respectively specify
situation forms a reference against which the next scenario            resource reservation and resource quota are immutable
is compared. In the second scenario, a second identical Pod            fields.
is started. Table 1 shows the measurements, whereby the                However, recently, a lot of work by the Kubernetes
shared and private memory per process is shown in kilobytes            community has been spent on allowing in-place updates of
(kB).                                                                  Pod resources without restarting Pods [1]. The design of this
To convert this into useful numbers, we compare two                    new feature does not depend on the pre-emptive scheduling
deployment options: (i) two Pods run on separate nodes and             feature of Kubernetes scheduler: when the vertical scaling
(ii) 2 Pods run on the same node.                                      action of a Pod will cause the Pod to not fit on its current
     Scenario 1: Pods on 2 nodes                                      node, no eviction of a lower-priority Pod will be tried by the
                 o Total cost = 2 * (Shared + Private)                 scheduler. Instead, the scaling action is aborted and re-tried
                 o 2 * (1096 + 1436) = 5064                            later by the local Kubelet agent of that node.
     Scenario 2: Pods on 1 node
                 o Total cost = Shared + 2 * Private
                 o 2124 + 2 * 384 = 2892                               3 The PolyPod Concept
This shows that it is enormously cheaper to run the                    The PolyPod concept is not meant as an extension to the
containers on the same node if they have the same libraries.           concept of Pod in Kubernetes. Instead, it intends to be a blue-
                                                                       print for application architectures while remaining
2.3 Horizontal vs. Vertical Scaling
                                                                       compatible with the existing state-of-practice of blue-green
The most used approach in Kubernetes for increasing and                deployments as outlined in Section 3.1. By using Pod
decreasing resources in blue-green deployments is by                   affinity constraints, different versions of the same
scaling out and in, i.e. horizontal scaling of Pods. After all,        application Pod can be placed together on the same node.
as workload increases drastically, the increasing demand of            We propose the PolyPod concept based on the following
primarily stressed resources can only be satisfied by creating         three tenets. Firstly, a PolyPod consists of different Pods that
new Pods on other nodes.                                               run different versions of the same container image; if these
However, horizontal scaling is not always an optimal option.           versions depend on common libraries, memory can be saved
This is especially the case when running the last set of users         provided the Pods are always co-located on the same node
in an almost removed version. After all, to meet availability          so that the libraries can be shared among Pods. To define the
requirements, at least t + 1 Pods of that older version must           expected resource capacity of a node, a PolyPod defines the
always be running [2]. As such, decreasing below t+1 Pods              total resource request for all Pods as immutable values.
is not desired. Instead, when the last batches of users are            Secondly, for each Pod in a PolyPod a reservation is set for
being upgraded to a new version, vertical scaling is                   the amount of resources needed for serving an atomic unit of
preferred. Vertical scaling will keep the resource allocation          work. We term this concept user. Users can then be migrated
of the Pods of the older version in an optimal state, i.e. the         among Pods in different sizes (i.e. number of users) by
right amount of resources are allocated so there is no                 means of vertical scaling. A PolyPod can opt-in or opt-out
unnecessary overprovisioning of resources [7].                         for no-restart of its Pods during vertical scaling.
Another disadvantage of horizontal scaling is that scaling-in          Thirdly, if the container orchestrator allows to manage
requires to wait for the to-be-deleted Pods to reach a                 shared memory among Pods, a PolyPod allows to set an
quiescent state which may take a long time and therefore               estimated reservation for the shared memory and separate
takes a lot of resources during each upgrade phase.                    reservations for the private memory of its constituent Pods.
Finally, a roll-back to an older version becomes very                  A PolyPod P is thus defined as ({P1,…,Pm}, R, MP\s, {R1,…,
expensive when the ongoing upgrade to the newer version                Rm}, {T1,…,Tm}, s), where Pi is a Pod running application
has almost completed.
                                                                       version i and all Pi:1..m must be placed on the same node, R is
The latter two disadvantages also disappear with vertical
                                                                       the immutable request for resources by P and defined as
scaling provided that vertical scaling can be done without
                                                                       (cpuP, memP), MP\s is the request of memory to be shared


                                                                  15
    WOC’20, December, 2020, Delft, The Netherlands                                                                                                    E. Truyen et al.

among the Pods of P, Ri=(cpui,memi\p) is the request for                                     G=(P,i,j,n), where n <= Tj is the number of users that are
resources by Pi so that a single user can be served by Pi                                    upgraded from Pod Pj to Pod Pj+i.
where memi\p is the request for the private part of memory
of Pi, Ti is the current number of users served by Pi. Finally,
if s=1, all Pi opt-in for no-restart, if s=2, all Pi opt-out for
no-restart.
A full implementation of the PolyPod concept does not exist
yet. The implementation of the PolyPod concept is based on                                      U2(2)=R1.3+R2.1+R2.(1+2)              D1(2)=R1.3+R1.1+R2(1+2)
the expectation that resource requests and limit will be                                          Figure 2: Scaling up and down with Pod restarts
mutable fields in a future version of Kubernetes [1] (cfr.
Section 2.2). We have experimented with an implementation                                    Then the total request cost for scaling up Pod Pj+i is defined
where Pods do not declare resource requests and resource                                     as follows:
allocation is managed by means of a front-end request                                        𝑈𝑗+𝑖 (𝑛) = (𝑅𝑗 . 𝑇𝑗 ) + 𝑠. (𝑅𝑗+𝑖 . 𝑇𝑗+𝑖 ) + (𝑅𝑗+𝑖 . 𝑛)                (1)
scheduler. The concept of a shared memory request memP\s
                                                                                             The total request cost for scaling down Pod Pj is defined as:
and a set of private memory requests Ti.memi\p could be
implemented as part of the Linux kernel cgroups                                              𝐷𝑗 (𝑛) = 𝑠. (𝑅𝑗 . 𝑇𝑗 ) + (𝑅𝑗+𝑖 . 𝑇𝑗+𝑖 ) + (𝑅𝑗+𝑖 . 𝑛) − (𝑅𝑗 . 𝑛)       (2)
hierarchy [9]. The total amount of memory requested (i.e.                                    The goal is to host as many users as possible and never run
memP\s+T1.mem1\p+..+Tm.memm\p) is passed to the parent                                       into a situation where users cannot be migrated due to a too
of the pause container of every Pod Pi [10]. Kubernetes                                      high surge cost1:
already relies on the cgroup hierarchy for supporting Pod
                                                                                             max(𝑈𝑗+𝑖 (𝑛)𝑐𝑝𝑢 , 𝐷𝑗 (𝑛)𝑐𝑝𝑢 ) < 𝑐𝑝𝑢𝑃 − ∑ 1≤ 𝑘 ≤ 𝑚 (𝑐𝑝𝑢𝑘 . 𝑇𝑘 ) (3)
level resource management constructs [11].                                                                                                  𝑘≠𝑗,𝑘≠𝑗+𝑖

                                                                                             max(𝑈𝑗+𝑖 (𝑛)𝑚𝑒𝑚 , 𝐷𝑗 (𝑛)𝑚𝑒𝑚 ) < 𝑚𝑒𝑚𝑃 − 𝑀𝑃\𝑠 −
4 Understanding the Trade-off                                                                ∑ 1≤ 𝑘 ≤ 𝑚 (𝑚𝑒𝑚𝑘\𝑝 . 𝑇𝑘 )                                              (4)
                                                                                               𝑘≠𝑗,𝑘≠𝑗+𝑖
This section analyzes the trade-off between maximum                                          𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 ∑1≤ 𝑘 ≤ 𝑚 𝑇𝑘                                                 (5)
migration flexibility, hosting the highest amount of users and
never end up in a situation where after a number of migration                                4.2 Simulating different migration scenarios
steps, users cannot be migrated anymore from an older to a                                   We make a simulation of the possible migration steps based
newer version due to a too high surge cost.                                                  on the above formulas using Desmos[12] for CPU resources
                                                                                             only. The simulation applies to an example situation for a
4.1 Modeling the surge cost                                                                  PolyPod with 2 Pods for CPU resources only. Assuming our
The surge cost is needed to support zero-down upgrades and                                   cluster has nodes with 2 CPU cores, the PolyPod’s request R
appears when scaling up and down. This surge cost is the                                     is set to 1750 millicores.
largest when Pods opt-out for no restart and illustrated in                                  We distinguish between the cases where R1=R2 and R1≠R2.
Figure 2. When migrating from an application version 1 to                                    In the case where values for R1=R2 = 90 millicores and no-
version 2, the resources of version 2 must first be increased.                               restart is opted-in, we see that maximum 18 users can be
When Pods opt-out for no-restart, this must be performed by                                  hosted by a single PolyPod provided the migration size n is
means of a rolling upgrade, where Pods are replaced one-by-                                  always kept to 1 (see Figure 3a). So 18 different scaling
one, in order to prevent service disruption for application                                  actions are needed to migrate all users. If all users need to be
version 2. After users are migrated to version 2, the Pods of                                migrated in one step, only 9 users can be hosted. If users can
version 1 are down-scaled which again involves a rolling                                     be migrated in two steps (9 users, 1 user), 10 users can be
upgrade. Of course when Pods do opt-in for no-restart, the                                   hosted (see Figure 3b). If no-restart is opted out, a maximum
cost will be lower, but there is still a surge cost. We model                                of 9 users can be hosted, but the migration size n does not
this surge cost for a deployment on one node as follows.                                     matter (see Figure 3c-d). Note that when Rj=Rj+i the cost for
A migration step consists thus of a scaling-up and scaling                                   scaling down Dj(n) becomes independent of the migration
down phase. A migration step for PolyPod P is defined as                                     size n.


1
  Formulas (3) and (4) do not express the temporal logic that this constraint must
“always” be satisfied, but they allow us to instantiate multiple migration steps over
time using the graph calculator desmos[12]


                                                                                        16
 Migration in Blue-Green Deployments within a Fixed Cost                        WOC’20, December, 2020, Delft, The Netherlands

In the case R1>R2, Pod P2 will be over-provisioned (see                  Based on the above simulations we can draw the following
Figure 4). Let R1=90 and R2=65. If no-restart is opted in, then          insights on the trade-off between maximum flexibility and
still 18 users can be hosted if n=1 (see Figure 4a). If all users        maximum workload consolidation ∑1≤ 𝑘 ≤ 𝑚 𝑇𝑘 within a total
need to be migrated in one step, 11 users can be hosted at               fixed cost R. First, the resources needed for running a single
most (see Figure 4b). If no-restart is opted out, also 11 users          user should be as equally as possible or at least
can be hosted, but migration size matters: all 11 users must             ∀ 𝑖: 2. . 𝑚: 𝑅𝑖 ≤ 𝑅1 . Otherwise, not all workload can be
be migrated in at most three steps and the first step must be            migrated from old to new versions by means of vertical
higher than 8 users (see Figure 4e), i.e., the following steps           scaling. A consequence from this is that the request Ri could
are possible (11),(10,1),(9,2),(9,1,1). However, if 9 users are          be relaxed towards an atomic unit of migration size instead
hosted, migration size does not matter anymore (see Figure               of an atomic unit of workload and to meet user demands the
4c-d).                                                                   horizontal Pod auto-scaler of Kubernetes should be used to
                                                                         manage service-level objectives.
                                                                         Secondly, when no-restart is opted-in, more workload can be
                                                                         hosted by a single node if the migration size is as small as
                                                                         possible, but this requires a large number of migration steps.


Figure 3: R1=R2. For all possible combinations of T1xT2 as
scoped by the blue triangle, it is always possible to migrate
n users within a fixed cost of 1750 millicores. The optimal                     Figure 4: R1 > R2 (cfr. Fig.3 for explanation)
capacity is given by the hypotenuse (the longest side of the
                          triangle).

In the case R1<R2, i.e. R1=90 and R2=115, it is only
possible to migrate all users if P1 is under-provisioned from
the beginning with 14 users in case s=1 and n=1 (see Figure
5a). If s=1 and n=8, only 8 users can be hosted by Pod P1
(see Figure 5b). If s=2 and n=8, at most 8 users can hosted,
but again migration size matters: all 8 users must be migrated
in one step, or in two equal steps of 4 users (see Figure 5d-
e) . If 7 users are hosted, migration size does not matter (see
Figure 5c).

4.3 Discussion
                                                                               Figure 5: R1 > R2 (cfr. Fig. 3 for explanation)


                                                                    17
 WOC’20, December, 2020, Delft, The Netherlands                                                                                    E. Truyen et al.

If all workload must be migrated in one step, only a lower              6 Conclusion
amount of workload can be hosted.                                       This paper has presented the PolyPod concept for migrating
Thirdly, if no-restart is opted-out, the migration size does not        workload in blue-green deployments by means of vertical
matter when the sum of the requests of the Pods is always a             scaling with and without Pod restarts. It supports balancing
constant. Moreover if ∀ 𝑖: 2. . 𝑚: 𝑅𝑖 ≤ 𝑅1 the migration size           the trade-off between maximum flexibility in the number of
neither matters provided an amount of over-provisioning of              migration steps and maximum workload consolidation
P1 is tolerated.                                                        within a fixed total resource cost. This keeps blue-green
The above insights enable application architects to select an           deployments pinned on the same set of nodes in order to
appropriate trade-off between flexible migration and                    benefit from sharing of common libraries in memory.
workload consolidation within a total resource cost.
                                                                        REFERENCES
                                                                        [1]     V.       Kulkarni,       “enhancements/20181106-in-place-update-of-pod-
5 Related Work                                                                  resources.md at master · kubernetes/enhancements.” [Online]. Available:
                                                                                https://github.com/kubernetes/enhancements/blob/master/keps/sig-
Shekhar et al [13] combines vertical scaling and machine                        node/20181106-in-place-update-of-pod-resources.md. [Accessed: 17-Sep-
learning to support pro-active automated scaling by                             2020].
                                                                        [2]     F. B. Schneider, “Implementing Fault-Tolerant Services Using the State
predicting the workload on the system and its performance.                      Machine Approach: A Tutorial,” ACM Comput. Surv., vol. 22, no. 4, pp.
                                                                                299–319, Jan. 1990.
This work has been evaluated on top of Docker that allows               [3]     G. G. Claps, R. Berntsson Svensson, and A. Aurum, “On the journey to
adjusting resource allocation of containers without restarting                  continuous deployment: Technical and social challenges along the way,”
                                                                                in Information and Software Technology, 2015, vol. 57, no. 1, pp. 21–31.
them.                                                                   [4]     T. A. Limoncelli, S. R. Chalup, and C. J. Hogan, “The Practice of Cloud
Vertical Pod Autoscaler (VPA) [14] is an add-on                                 System Administration: Designing and Operating Large Distributed
                                                                                Systems, Volume 2,” 2014. [Online]. Available: http://the-cloud-
functionality in Kubernetes that also supports vertical                         book.com/. [Accessed: 14-Jan-2016].
scaling in an automated way to migitate over-provisioning.              [5]     F. Gey, D. Van Landuyt, and W. Joosen, “Evolving multi-tenant SaaS
                                                                                applications through self-adaptive upgrade enactment and tenant
RUBAS et al. [15] proposes an alternative of VPA where it                       mediation,” in International Symposium on Software Engineering for
                                                                                Adaptive and Self-Managing Systems, 2016, no. 11, pp. 151–157.
is possible to dynamically change the resources of Pods                 [6]     J. Bravo Ferreira, M. Cello, and J. O. Iglesias, “More sharing, more
without causing a service downtime by relying on the                            benefits? a study of library sharing in container-based infrastructures,” in
                                                                                Lecture Notes in Computer Science (including subseries Lecture Notes in
Checkpoint Restore in Userspace (CRUI) functionality. This                      Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol.
makes it possible to pause a container, save the complete                       10417 LNCS, pp. 358–371.
                                                                        [7]     K. Rzadca et al., “Autopilot: workload autoscaling at Google,” in EuroSys
state of the container and restart the container at a later time                2020, 2020.
with the correct state. This technique is very effective when           [8]     S. Shekhar, H. Abdel-Aziz, A. Bhattacharjee, A. Gokhale, and X.
                                                                                Koutsoukos, “Performance Interference-Aware Vertical Elasticity for
it comes to stateful applications. The results show that                        Cloud-Hosted Latency-Sensitive Applications,” in 2018 IEEE 11th
                                                                                International Conference on Cloud Computing (CLOUD), 2018.
vertical scaling via a migration with RUBAS can be                      [9]     T. Heo, “linux/cgroup-v2.rst at master · torvalds/linux,” 2015. [Online].
performed faster than when the resources are restarted with                     Available:
                                                                                https://github.com/torvalds/linux/blob/master/Documentation/admin-
VPA. While the time of a migration is short, it is still                        guide/cgroup-v2.rst. [Accessed: 21-Sep-2020].
significant enough to cause a service interruption. As long as          [10]    A. Mohan, H. Sane, K. Doshi, S. Edupuganti, N. Nayak, and V.
                                                                                Sukhomlinov, “Agile Cold Starts for Scalable Serverless,” in HotCloud
the pod is migrating, it cannot process requests and must be                    2019, 2019.
compensated elsewhere.                                                  [11]    “community/pod-resource-management.md               at       master        ·
                                                                                kubernetes/community.”                   [Online].               Available:
Wang et al [16] explores how to enable resource sharing                         https://github.com/kubernetes/community/blob/master/contributors/desig
                                                                                n-proposals/node/pod-resource-management.md. [Accessed: 22-Sep-
between containers and how to reduce container image sizes.                     2020].
The goal is to achieve efficiency here in a different way. A            [12]    Desmos Inc., “Desmos API v1.0 documentation,” Desmos, Beautiful, Free
                                                                                Math. [Online]. Available: www.desmos.com. [Accessed: 21-Sep-2020].
new container management system is proposed where                       [13]    S. Shekhar, H. Abdel-Aziz, A. Bhattacharjee, A. Gokhale, and X.
containers run on top of a shared layer that contains the                       Koutsoukos, “Performance Interference-Aware Vertical Elasticity for
                                                                                Cloud-Hosted Latency-Sensitive Applications,” in 2018 IEEE 11th
essential software and libraries for all containers. The paper                  International Conference on Cloud Computing (CLOUD), 2018.
compares this architecture to Docker in terms of image sizes            [14]    Cloud Native Computing Foundation, “autoscaler/vertical-pod-autoscaler
                                                                                at     master      ·    kubernetes/autoscaler.”    [Online].     Available:
and startup time. The proposed architecture scores better in                    https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-
                                                                                autoscaler. [Accessed: 12-Nov-2018].
both areas, with smaller image sizes and a lower container              [15]    G. Rattihalli, M. Govindaraju, H. Lu, and D. Tiwari, “Exploring potential
start-up time. Whether the proposed architecture is also more                   for non-disruptive vertical auto scaling and resource estimation in
                                                                                kubernetes,” IEEE Int. Conf. Cloud Comput. CLOUD, vol. 2019-July, pp.
memory efficient or guarantees a better performance is not                      33–40, 2019.
presented.                                                              [16]    W. Wang, L. Zhang, D. Guo, S. Wu, H. Cui, and F. Bi, “Reg: An Ultra-
                                                                                lightweight Container that Maximizes Memory Sharing and Minimizes the
To our knowledge, none of the above works aim to balance                        Runtime Environment,” in 2019 IEEE International Conference on Web
migration flexibility and cost in blue-green deployments.                       Services (ICWS), 2019.


                                                                   18