
๊ณ ๊ฐ์ฉ์ฑ(High Availability) AWS Lambda ์๋ฒ ๊ตฌ์ถ

- #aws lambda
- #timeout
- #runtime crash
- #high availability
- #fromm
๋ค์ด๊ฐ๋ฉฐ
๋
ธ๋จธ์ค๋ latency๊ฐ ์ค์ํ ์๋น์ค๋ ECS Fargate๋ก ์ด์ํ์ง๋ง, ๋ง์ ์๋น์ค๋ฅผ AWS Lambda๋ก ์ด์ํ๊ณ ์์ต๋๋ค.
AWS Lambda ๋ฐฐํฌ ๊ด๋ฆฌ๋ serverless framework๊ณผ github actions์ ์ฌ์ฉํ๊ณ ์์ต๋๋ค.
AWS Lambda๋ ์๋ฒ๋ฆฌ์ค ์ปดํจํ ์ ํต์ฌ ์ฅ์ ๋ค์ ์ ๊ณตํฉ๋๋ค.
- ์๋ฒ ๊ด๋ฆฌ ์์ด ์๋ ํ์ฅ: ํธ๋ํฝ ๋ณํ์ ๋ฐ๋ผ ์๋์ผ๋ก ์ค์ผ์ผ ์ /๋ค์ด๋์ด ์ธํ๋ผ ์ด์ ๋ถ๋ด์ด ์ ์ต๋๋ค.
- ์ข ๋์ ๊ณผ๊ธ: ์ค์ ์ฝ๋ ์คํ ์๊ฐ๊ณผ ์์ฒญ ์๋งํผ๋ง ๋น์ฉ์ด ๋ฐ์ํ์ฌ ๋น์ฉ ํจ์จ์ ์ ๋๋ค.
- ๋น ๋ฅธ ๋ฐฐํฌ์ ์์ : ์ฝ๋๋ง ์ ๋ก๋ํ๋ฉด ์ฆ์ ๋ฐ์๋์ด ๊ฐ๋ฐ ์๋๊ฐ ํฅ์๋ฉ๋๋ค.
ํ์ง๋ง AWS Lambda์๋ ์์ธกํ๊ธฐ ์ด๋ ค์ด ๋ถ๋ถ์ด ์์ต๋๋ค. ๋ฐ๋ก ํจ์๊ฐ ์ธ์ ์ข
๋ฃ๋ ์ง ์ ์ ์๋ค๋ ์ ์
๋๋ค.
์ ์์ ์ธ ์ฒ๋ฆฌ ์๋ฃ ํ ์ธ์ ์ข
๋ฃ๋ ์ง ๋ชจ๋ฅด๋ ๊ฒ์ ๊ทธ๋ ๋ค ์น๋๋ผ๋,
(์ ์์ ์ธ ์ข
๋ฃ๋ ์ ์ ์์ผ๋ฉด, redis์ ๋ถ๋ด์ ์ค์ฌ์ค ์ ์๋ swap out ๊ฐ์ ๊ธฐ๋ฅ์ ๊ตฌํํด ๋ณผ ์ ์์ต๋๋ค.),
๋น์ ์์ ์ธ ์ข
๋ฃ๋ ์๋น์ค๋ฅผ ์ด์ํ๋ ์
์ฅ์์ ๊ฐ๊ณผํ๊ธฐ ์ด๋ ค์ด ๋ฌธ์ ์
๋๋ค.
AWS ๋ Lambda Failures ์ ๋ํด ๋ค์๊ณผ ๊ฐ์ ๋ฌธ์๋ฅผ ์ ๊ณตํฉ๋๋ค.
https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
์ ํฌ๋ ์ด ๋ฌธ์๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ด๋ป๊ฒ ๊ณ ๊ฐ์ฉ์ฑ AWS Lambda ์๋น์ค๋ฅผ ๊ตฌ์ถํ ์ ์๋์ง์ ๋ํด ์๊ฐํ๊ณ ์ ํฉ๋๋ค.
Lambda Failures
1. Timeout
AWS Lambda ํจ์๋ ๊ธฐ๋ณธ์ ์ผ๋ก 3์ด์ ํ์์์์ ๊ฐ์ง๊ณ ์๊ณ , ์ต๋ 15๋ถ๊น์ง ์ค์ ํ ์ ์์ต๋๋ค.
ํ์์์์ด ๋ฐ์ํ๋ฉด Lambda ํจ์๋ ๋น์ ์์ ์ผ๋ก ์ข
๋ฃ๋๋ฉฐ, ์ด๋ก ์ธํด ๋ฐ์ดํฐ ์์ค์ด๋ ์๋น์ค ์ค๋จ์ด ๋ฐ์ํ ์ ์์ต๋๋ค.
Lambda ๊ธฐ๋ฐ์ผ๋ก ํ ์๋ก์ด ์๋น์ค๋ฅผ ๋ฐฐํฌํ ๋, request event์ ๋ฐ๋ผ, ์ ์ ํ ํ์์์ ๊ฐ์ ์ค์ ํ์ง๋ง,
์ดํ ์ ์ง์ ์ผ๋ก ์ฆ๊ฐํ ์ ์๋ ์๋น์ค ๋ถํ์์ ํ์์์์ ๋ฐ์์ฌ๋ถ๋ฅผ ์ธ์งํ์ง ๋ชปํ ์ ์์ต๋๋ค.
ํนํ ํ์์์์ ์ฃผ๋ชฉํ๊ฒ ๋ ์ด์ ๋, ์ด๊ฒ์ด DB ๊ณผ๋ถํ์ ์ฌ์ ์งํ๋ก ๋ํ๋๋ ๊ฒฝ์ฐ๊ฐ ๋ง์๊ธฐ ๋๋ฌธ์
๋๋ค.
ํธ๋ํฝ์ด ์ฆ๊ฐํ ๋ Lambda๋ ํ์ฅ์ ์ ์ฝ์ด ์์ง๋ง, DB๋ Lambda ๋งํผ ๋์ ํน์ฑ์ด ์ข์ง ์์์ต๋๋ค.
๋ฐ๋ผ์ ํ์์์์ด ๋ฐ๊ฒฌ๋๋ฉด ํด๋น ์๋น์ค์ ๋ถํ ํ
์คํธ๋ฅผ ๋ค์ ๊ฒํ ํด์ผ ํ์ต๋๋ค.
2. Runtime Crash
์ ํฌ ํ์์ ๊ฐ๋ฐ์, ์์ฃผ ํ๋ ์ค์ ์ค์ ํ๋๊ฐ, TypeOrmModule.forFeature()
์ Entity๋ฅผ ๋ฑ๋กํ์ง ์๋ ๊ฒ์
๋๋ค.
๊ทธ๋ด ๊ฒฝ์ฐ, ์๋์ ๊ฐ์ ์๋ฌ๊ฐ ๋ฐ์ํฉ๋๋ค. ์ด๋ Lambda ์์ ์ ์ํฐํฐ ๋ฉํ๋ฐ์ดํฐ๋ฅผ ์ฐพ์ง ๋ชปํด ๋ฐ์ํ๋ ๋ฐํ์ ํฌ๋์์ ๋ํ์ ์ธ ์์์
๋๋ค.
TypeORMError: Entity metadata for <Table>#<Relation> was not found. Check if you specified a correct entity object and if it's connected in the connection options.
at /var/task/src/<WorkSpace>/lambda.js:205408:19
at Array.forEach (<anonymous>)
at EntityMetadataBuilder.computeInverseProperties (/var/task/src/<WorkSpace>/lambda.js:205405:34)
at /var/task/src/<WorkSpace>/lambda.js:204908:58
at Array.forEach (<anonymous>)
at EntityMetadataBuilder.build (/var/task/src/<WorkSpace>/lambda.js:204908:25)
at ConnectionMetadataBuilder.buildEntityMetadatas (/var/task/src/<WorkSpace>/lambda.js:205797:150)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async DataSource.buildMetadatas (/var/task/src/<WorkSpace>/lambda.js:207373:33)
at async DataSource.initialize (/var/task/src/<WorkSpace>/lambda.js:207083:11)
๊ทธ๋ฆฌ๊ณ , ์ผ๋ง์ ํฐ Object ์ธ์ง ๋ชจ๋ฅด๊ณ , logger๋ก ์ถ๋ ฅํ๋ค๊ฐ, OOM(Out Of Memory) ์๋ฌ๊ฐ ๋ฐ์ํ์ต๋๋ค.
์๋น์ค ๋ท๋จ์์ ๋๋ Batch Lambda๋ผ, CS๋ฅผ ๋ฐ๊ณ ์ ์๊ฒ๋ ์ฅ์ ์์ต๋๋ค.
<--- Last few GCs --->
[2:0x558c38855fa0] 22535 ms: Scavenge 892.7 (937.2) -> 892.5 (938.0) MB, 18.35 / 0.00 ms (average mu = 0.900, current mu = 0.768) allocation failure;
[2:0x558c38855fa0] 22559 ms: Scavenge 893.6 (938.0) -> 893.3 (943.0) MB, 3.13 / 0.00 ms (average mu = 0.900, current mu = 0.768) allocation failure;
[2:0x558c38855fa0] 22876 ms: Mark-Compact 896.4 (943.0) -> 895.7 (946.2) MB, 273.55 / 0.00 ms (average mu = 0.798, current mu = 0.432) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
1: 0x558c13879cf3 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/var/lang/bin/node]
2: 0x558c13c452cd v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/var/lang/bin/node]
3: 0x558c13c45699 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/var/lang/bin/node]
4: 0x558c13ebe25a [/var/lang/bin/node]
5: 0x558c13ed8c2e v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/var/lang/bin/node]
6: 0x558c13ea88f2 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/var/lang/bin/node]
7: 0x558c13ea9f74 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/var/lang/bin/node]
8: 0x558c13e8305b v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [/var/lang/bin/node]
9: 0x558c13e6af23 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [/var/lang/bin/node]
10: 0x558c13e6db26 v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [/var/lang/bin/node]
11: 0x558c143bc88d v8::internal::IncrementalStringBuilder::Extend() [/var/lang/bin/node]
12: 0x558c14020ea8 v8::internal::JsonStringifier::SerializeString(v8::internal::Handle<v8::internal::String>) [/var/lang/bin/node]
13: 0x558c1402201a v8::internal::JsonStringifier::Result v8::internal::JsonStringifier::Serialize_<true>(v8::internal::Handle<v8::internal::Object>, bool, v8::internal::Handle<v8::internal::Object>) [/var/lang/bin/node]
14: 0x558c14026a91 v8::internal::JsonStringifier::Result v8::internal::JsonStringifier::Serialize_<false>(v8::internal::Handle<v8::internal::Object>, bool, v8::internal::Handle<v8::internal::Object>) [/var/lang/bin/node]
15: 0x558c14023ecd v8::internal::JsonStringifier::Result v8::internal::JsonStringifier::Serialize_<true>(v8::internal::Handle<v8::internal::Object>, bool, v8::internal::Handle<v8::internal::Object>) [/var/lang/bin/node]
16: 0x558c140248cf v8::internal::JsonStringifier::SerializeJSReceiverSlow(v8::internal::Handle<v8::internal::JSReceiver>) [/var/lang/bin/node]
17: 0x558c1402619a v8::internal::JsonStringifier::Result v8::internal::JsonStringifier::Serialize_<false>(v8::internal::Handle<v8::internal::Object>, bool, v8::internal::Handle<v8::internal::Object>) [/var/lang/bin/node]
18: 0x558c14027949 v8::internal::JsonStringify(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [/var/lang/bin/node]
19: 0x558c13cf015d v8::internal::Builtin_JsonStringify(int, unsigned long*, v8::internal::Isolate*) [/var/lang/bin/node]
20: 0x558c148091b6 [/var/lang/bin/node]
END RequestId: f7bd706c-3c22-44f2-a687-5192a09f71c3
REPORT RequestId: f7bd706c-3c22-44f2-a687-5192a09f71c3 Duration: 27627.15 ms Billed Duration: 27628 ms Memory Size: 1024 MB Max Memory Used: 1024 MB Init Duration: 1330.79 ms Status: error Error Type: Runtime.ExitError
XRAY TraceId: 1-685cb8e4-6ec7902913ad62f42e821e12 SegmentId: fe04f33977304521 Sampled: true
์ด Log Metric ๋ชจ๋ํฐ๋ง ์์คํ
์ด ๊ตฌ์ถ๋๊ธฐ ์ ์๋ ์ธ์งํ์ง ๋ชปํ๋ ๋ฐํ์ ํฌ๋์ฌ๋ ์์์ต๋๋ค. ๐ญ
API Cache์ null
๊ฐ์ ์ ์ฅํ๋ ค๊ณ ์๋ํ์ ๋ ๋ฐ์ํ ๋ฐํ์ ํฌ๋์ฌ์
๋๋ค.
2025-07-17T09:08:32.363Z 49917c08-5711-4d3a-a945-80cb6ad821f7 ERROR Unhandled Promise Rejection {
"errorType": "Runtime.UnhandledPromiseRejection",
"errorMessage": "Error: \"null\" is not a cacheable value",
"reason": {
"errorType": "Error",
"errorMessage": "\"null\" is not a cacheable value",
"stack": [
"Error: \"null\" is not a cacheable value",
" at /var/task/src/arti/lambda.js:369323:25",
" at new Promise (<anonymous>)",
" at Object.set (/var/task/src/arti/lambda.js:369311:18)",
" at Object.set (/var/task/src/arti/lambda.js:281628:43)",
" at Object.next (/var/task/src/arti/lambda.js:282191:37)",
" at /var/task/src/arti/lambda.js:30379:78",
" at OperatorSubscriber2._this._next (/var/task/src/arti/lambda.js:25392:13)",
" at OperatorSubscriber2.Subscriber2.next (/var/task/src/arti/lambda.js:24521:16)",
" at /var/task/src/arti/lambda.js:26805:24",
" at OperatorSubscriber2._this._next (/var/task/src/arti/lambda.js:25392:13)"
]
},
"promise": {},
"stack": [
"Runtime.UnhandledPromiseRejection: Error: \"null\" is not a cacheable value",
" at process.<anonymous> (file:///var/runtime/index.mjs:1326:17)",
" at process.emit (node:events:524:28)",
" at process.emit (/var/task/src/arti/lambda.js:2227:25)",
" at emitUnhandledRejection (node:internal/process/promises:250:13)",
" at throwUnhandledRejectionsMode (node:internal/process/promises:385:19)",
" at processPromiseRejections (node:internal/process/promises:470:17)",
" at processTicksAndRejections (node:internal/process/task_queues:96:32)"
]
}
REPORT RequestId: 49917c08-5711-4d3a-a945-80cb6ad821f7 Duration: 534.29 ms Billed Duration: 535 ms Memory Size: 512 MB Max Memory Used: 246 MB Status: error Error Type: Runtime.ExitError
XRAY TraceId: 1-6878bd8f-19a92cbd3ea1be424772578d SegmentId: 3bd62ade4f027f28 Sampled: true
๊ตฌํ
https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html ์์๋ ์ด 4๊ฐ์ง์ Lambda Failures๋ฅผ ์๊ฐํฉ๋๋ค.
Example CloudWatch Logs log output (runtime or extension crash) - old style
START RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1 Version: $LATEST
RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1 Error: Runtime exited without providing a reason
Runtime.ExitError
END RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1
REPORT RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1 Duration: 933.59 ms Billed Duration: 934 ms Memory Size: 128 MB Max Memory Used: 9 MB
Example CloudWatch Logs log output (function timeout) - old style
START RequestId: b70435cc-261c-4438-b9b6-efe4c8f04b21 Version: $LATEST
2024-03-04T17:22:38.033Z b70435cc-261c-4438-b9b6-efe4c8f04b21 Task timed out after 3.00 seconds
END RequestId: b70435cc-261c-4438-b9b6-efe4c8f04b21
REPORT RequestId: b70435cc-261c-4438-b9b6-efe4c8f04b21 Duration: 3004.92 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 33 MB Init Duration: 111.23 ms
The new format for CloudWatch logs includes an additional statusfield in the REPORT line. In the case of a runtime or extension crash, the REPORT line also includes a field ErrorType.
Example CloudWatch Logs log output (runtime or extension crash) - new style
START RequestId: 5b866fb1-7154-4af6-8078-6ef6ca4c2ddd Version: $LATEST
END RequestId: 5b866fb1-7154-4af6-8078-6ef6ca4c2ddd
REPORT RequestId: 5b866fb1-7154-4af6-8078-6ef6ca4c2ddd Duration: 133.61 ms Billed Duration: 133 ms Memory Size: 128 MB Max Memory Used: 31 MB Init Duration: 80.00 ms Status: error Error Type: Runtime.ExitError
Example CloudWatch Logs log output (function timeout) - new style
START RequestId: 527cb862-4f5e-49a9-9ae4-a7edc90f0fda Version: $LATEST
END RequestId: 527cb862-4f5e-49a9-9ae4-a7edc90f0fda
REPORT RequestId: 527cb862-4f5e-49a9-9ae4-a7edc90f0fda Duration: 3016.78 ms Billed Duration: 3016 ms Memory Size: 128 MB Max Memory Used: 31 MB Init Duration: 84.00 ms Status: timeout
Log Filter Strings
์ 4๊ฐ์ง์ Lambda Failures์์, ์ ํฌ๋ ์๋ ํ์ ๋ฌธ์์ด์ ๊ธฐ์ค์ผ๋ก CloudWatch Logs์์ ํํฐ๋งํ์ฌ, ์๋ฆผ์ ๋ฐ๋๋ก ๊ตฌํํ์ต๋๋ค.
style | Timeout | Runtime Crash |
---|---|---|
old | Task timed out after | Error: Runtime exited without providing a reason |
new | Status: timeout | Status: error , Error Type: Runtime.ExitError |
์ํคํ ์ณ
CloudWatch Metric โ CloudWatch Alarm โ SNS โ Lambda (์ฌ๋ ์ฅ์ ์ฑ๋๋ก ์๋ฆผ)
์ ํฌ๋ aws-sdk๋ฅผ ์ด์ฉํ์ฌ, CloudWatch Logs์ Metric Filter์ Metric Alarm์ ๋ฑ๋ก ํ์ต๋๋ค.
import { CloudWatchLogsClient, PutMetricFilterCommand } from '@aws-sdk/client-cloudwatch-logs';
import { CloudWatchClient, PutMetricAlarmCommand } from '@aws-sdk/client-cloudwatch';
๊ธฐ์กด์ ๋ง๋ค์ด์ง Lambda๋ DescribeLogGroupsCommand
๋ฅผ ํตํด, Metric Filter์ Metric Alarm์ ๋ฑ๋ก ์์ผ์ฃผ๊ณ ,
import { CloudWatchLogsClient, DescribeLogGroupsCommand } from '@aws-sdk/client-cloudwatch-logs';
์๋ก ๋ง๋ค์ด์ง Lambda๋ CreateLogGroup
์ด๋ฒคํธ๋ฅผ ํตํด, Metric Filter์ Metric Alarm์ ๋ฑ๋ก ์์ผ์ค๋๋ค.
CreateMetric:
handler: src/create_metric/lambda.handler
events:
- eventBridge:
pattern:
source:
- aws.logs
detail-type:
- AWS API Call via CloudTrail
detail:
eventSource:
- logs.amazonaws.com
eventName:
- CreateLogGroup
๋ง์น๋ฉฐ
์ด์ ์ ํฌ๋ AWS Lambda์ ํ์์์๊ณผ ๋ฐํ์ ํฌ๋์๋ฅผ ๋ชจ๋ํฐ๋งํ๊ณ , ์๋ฆผ์ ๋ฐ์ ์ ์๋ ์์คํ
์ ๊ฐ์ถ๊ฒ ๋์์ต๋๋ค.
์ด ์์คํ
์ ์๋น์ค์ ์์ ์ฑ์ ๋์ด๊ณ , ์ฅ์ ๋ฐ์ ์ ๋น ๋ฅด๊ฒ ๋์ํ ์ ์๋ ๊ธฐ๋ฐ์ด ๋ ๊ฒ์
๋๋ค.
์์ผ๋ก๋ AWS Lambda๋ฅผ ํ์ฉํ ์๋น์ค ์ด์์์ ๋ฐ์ํ ์ ์๋ ๋ค์ํ ๋ฌธ์ ๋ค์ ํด๊ฒฐํ๊ธฐ ์ํด ์ง์์ ์ผ๋ก ๋
ธ๋ ฅํ ๊ฒ์
๋๋ค.
์ด ๊ธ์ด AWS Lambda๋ฅผ ์ด์ํ๋ ๋ค๋ฅธ ๊ฐ๋ฐ์๋ค์๊ฒ๋ ๋์์ด ๋๊ธฐ๋ฅผ ๋ฐ๋๋๋ค.
๊ฐ์ฌํฉ๋๋ค.