fromm์˜ OpenTelemetry ๋„์ž… ์—ฌ์ •

yeon
  • #monitoring
  • #observability
  • #OpenTelemetry
  • #SigNoz

์•ˆ๋…•ํ•˜์„ธ์š”. ๋…ธ๋จธ์Šค์—์„œ fromm ๋ฐฑ์—”๋“œ ๊ฐœ๋ฐœ์„ ํ•˜๊ณ  ์žˆ๋Š” ๊น€์—ฐํƒœ์ž…๋‹ˆ๋‹ค.

๋ฐฑ์—”๋“œ ํŒ€์—์„œ ๋Š๋‚€ CloudWatch์˜ ํ•œ๊ณ„์™€ OpenTelemetry์˜ ๋„์ž…

fromm์˜ ๋ฐฑ์—”๋“œ๋Š” ECS์™€ Lambda์—์„œ ๋™์ž‘ํ•˜๋Š” nest.js ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ธฐ์กด ๋ฐฑ์—”๋“œ ์‹œ์Šคํ…œ์˜ ๋ชจ๋‹ˆํ„ฐ๋ง์€ AWS CloudWatch๋กœ ํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. CloudWatch๋ฅผ ํ†ตํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง์€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฐฐํฌ์‹œ ๋ฉ”ํŠธ๋ฆญ๊ณผ ๋กœ๊ทธ๋ฅผ ์„ค์ •์˜ ์–ด๋ ค์›€ ์—†์ด ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ , exporter ์™€ collector ๋“ฑ ์ถ”๊ฐ€์ ์ธ ์ธํ”„๋ผ ๊ตฌ์„ฑ ๋ฐ ๊ด€๋ฆฌ๋ถ€๋‹ด์ด ์—†๋‹ค๋Š” ์  ๋“ฑ์ด ์žฅ์ ์ด๋ผ ์ž˜ ํ™œ์šฉํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•ฑ์˜ ๊ทœ๋ชจ๊ฐ€ ์ปค์ง€๊ณ  ๋‹จ์ˆœํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง์—์„œ observability ํ™•๋ณด์˜ ํ•„์š”์„ฑ์ด ์ƒ๊ธฐ๋ฉฐ ๋‹ค์Œ๊ณผ ๊ฐ™์€ CloudWatch์˜ ํ•œ๊ณ„๊ฐ€ ๋Š๊ปด์ง€๊ธฐ ์‹œ์ž‘ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ปค์Šคํ…€ ๋ฉ”ํŠธ๋ฆญ์˜ ๋ถ€์กฑ

CloudWatch์—์„œ ์ œ๊ณต๋˜๋Š” Lambda function ๋‹จ์œ„์˜ API Count, Latency ๋“ฑ์˜ ๋ฉ”ํŠธ๋ฆญ์œผ๋กœ function ์ „์ฒด ์ƒํƒœ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋ง ํ•  ์ˆ˜๋Š” ์žˆ์ง€๋งŒ, API Path๋ณ„๋กœ Count, Error, Latency ๋“ฑ์„ ์ธก์ •ํ•˜๋ ค๋ฉด custom metric์„ ์ง์ ‘ ๋งŒ๋“ค์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ log๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•„์š”ํ•œ custom metric์„ ํ•˜๋‚˜ํ•˜๋‚˜ ์„ค์ •ํ•ด์•ผ ํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ถ”๊ฐ€ ๋น„์šฉ๊นŒ์ง€ ๋ถ€๋‹ดํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ถ„์‚ฐ ํŠธ๋ ˆ์ด์‹ฑ ๋ฏธ์ง€์›

์•ฑ์˜ ๊ทœ๋ชจ๊ฐ€ ์ปค์ง€๊ณ  ์—ฌ๋Ÿฌ ์„œ๋น„์Šค๊ฐ€ ์ถ”๊ฐ€๋จ์— ๋”ฐ๋ผ ์„œ๋น„์Šค๊ฐ„ internal call์ด ๋งŽ์•„์กŒ์Šต๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ์š”์ฒญ์„ ๋”ฐ๋ผ๊ฐ€๋ฉฐ ์–ด๋””์—์„œ ์–ด๋–ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ๊ณ  ์–ด๋Š์ •๋„์˜ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์žˆ๋Š”์ง€ ํ๋ฆ„์„ ํŒŒ์•…ํ•  ํ•„์š”์„ฑ์ด ๋Š๊ปด์กŒ์Šต๋‹ˆ๋‹ค.

Metric Resolution์˜ ํ•œ๊ณ„

CloudWatch์—์„œ ์ œ๊ณต๋˜๋Š” ๋ฉ”ํŠธ๋ฆญ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ 1๋ถ„๋‹จ์œ„์˜ Standard Resolution์„ ์ œ๊ณตํ•˜๋ฉฐ ๊ธฐ๊ฐ„์ด ๊ธธ์–ด์งˆ์ˆ˜๋ก ํ•ด์ƒ๋„๋Š” ๋”์šฑ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. fromm์˜ ํŠธ๋ž˜ํ”ฝ ์–‘์ƒ์„ ํŒŒ์•…ํ•ด๋ณด๋‹ˆ 1๋ถ„ Count๋กœ ์žกํžŒ ์ง„์ž… ํŠธ๋ž˜ํ”ฝ์˜ 50% ์ด์ƒ์ด ์ดˆ๊ธฐ 5~10์ดˆ ๋‚ด์— ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๊ณ  ๋”ฐ๋ผ์„œ 1์ดˆ, 5์ดˆ, 10์ดˆ ๋‹จ์œ„์˜ High Resolution์ด ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋กœ๊ทธ์˜ ์ง€์—ฐ

CloudWatch์˜ Log Insight์˜ ๊ฒฝ์šฐ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋กœ๊ทธ๊ฐ€ ์ธ์ž…๋˜๊ธฐ๊นŒ์ง€ 1~2๋ถ„ ์ •๋„์˜ ์ง€์—ฐ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. Live Tail์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์‹ค์‹œ๊ฐ„ ๋กœ๊ทธ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ECS์„œ๋น„์Šค ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Lambda์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„ log group์˜ ๊ฐœ์ˆ˜ ๋˜ํ•œ ๋งŽ์•„์กŒ๊ณ  ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•œ ์ƒํ™ฉ์—์„œ ์ด๋ฅผ ์ „๋ถ€ Live Tail๋กœ ํ™•์ธํ•˜๊ธฐ๋Š” ์–ด๋ ค์› ์Šต๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๋•Œ์— ์ ์ ˆํ•œ ๋กœ๊ทธ๋ฅผ ๋ฐ”๋กœ๋ฐ”๋กœ ๊ฒ€์ƒ‰ํ•  ํ•„์š”๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

์œ„์™€ ๊ฐ™์€ ์ด์œ ๋กœ CloudWatch์—์„œ ๋ฒ—์–ด๋‚˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ observability๋ฅผ ํ™•๋ณดํ•˜๊ณ ์ž ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Datadog, Splunk, Whatap ๋“ฑ ๋‹ค์–‘ํ•œ ์ƒ์šฉ์†”๋ฃจ์…˜, LGTM ์Šคํƒ, OpenTelemetry ๋“ฑ์˜ ๋ชจ๋‹ˆํ„ฐ๋ง ํˆด์˜ ์‚ฌ์šฉ์„ ๊ฒ€ํ† ํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค์„ ๊ฒ€ํ† ํ•˜๋ฉฐ ํ•˜๋‚˜๋ฅผ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ค€์€ ๋‹ค์Œ๊ณผ ๊ฐ™์•˜์Šต๋‹ˆ๋‹ค.

  1. API๋ณ„ Latency, Count, Saturation ๋“ฑ ํ•„์š”ํ•œ Metric์„ ๊ณ ํ•ด์ƒ๋„๋กœ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ.
  2. ์—ฌ๋Ÿฌ ์„œ๋น„์Šค๋“ค์— ๊ฑธ์นœ ์ด๋ฒคํŠธ์˜ ๋ถ„์‚ฐ ์ถ”์ ์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ.
  3. ์‹ค์‹œ๊ฐ„์— ๊ฐ€๊นŒ์šด ๋กœ๊ทธ์˜ ์ˆ˜์ง‘๊ณผ ๊ฐ„ํŽธํ•œ ๊ฒ€์ƒ‰์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ.
  4. ECS ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Lambda์—์„œ๋„ ๋™์ผํ•œ ๋ชจ๋‹ˆํ„ฐ๋ง์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ.
  5. ์šด์˜์— ๊ณผ๋„ํ•œ ์‹œ๊ฐ„ ๋ฐ ๋น„์šฉ์ด ๋“ค์–ด๊ฐ€์ง€ ์•Š์„ ๊ฒƒ.

Datadog ๋ฐ Splunk ์˜ ๊ฒฝ์šฐ ์œ„์˜ ์กฐ๊ฑด๋“ค์„ ๋Œ€๋ถ€๋ถ„ ๋งŒ์กฑํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ•๋ ฅํ•œ ์†”๋ฃจ์…˜์ด์—ˆ์ง€๋งŒ ์ด๋ฅผ ๋„์ž…ํ•˜๊ฒŒ ๋˜๋ฉด ํ˜„์žฌ ์ธํ”„๋ผ ๋น„์šฉ๊ณผ ๋งž๋จน๋Š” ์ถ”๊ฐ€๋น„์šฉ์„ ์ง€๋ถˆํ•ด์•ผ๋งŒ ํ–ˆ์Šต๋‹ˆ๋‹ค. Whatap์˜ ๊ฒฝ์šฐ ECS, RDS์— ์žˆ์–ด์„  ์ข‹์€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉด์„œ๋„ ๊ฐ€๊ฒฉ์ด ๋น„๊ต์  ์ €๋ ดํ–ˆ์ง€๋งŒ Lambda์˜ ๋ชจ๋‹ˆํ„ฐ๋ง์€ ๋ถˆ๊ฐ€๋Šฅํ•ด ์ €ํฌ๊ฐ€ ์‚ฌ์šฉํ•˜๊ธฐ์—” ์–ด๋ ต๋‹ค๋Š” ํŒ๋‹จ์ด ๋“ค์—ˆ๊ณ  LGTM ์˜ ๊ฒฝ์šฐ DevOps ํŒ€ ์—†์ด ๋ฐฑ์—”๋“œํŒ€์—์„œ ์ง์ ‘ ๋„์ž…ํ•ด์„œ ์‚ฌ์šฉํ•˜๊ธฐ์—” ์šด์˜๋‚œ์ด๋„๊ฐ€ ๋„ˆ๋ฌด ๋†’๊ณ  ์Šค์ผ€์ผ์ด ํฌ๋‹ค๊ณ  ํŒ๋‹จํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ €ํฌ๋Š” OpenTelemetry ๋กœ Metric, Trace, Log ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  Self-Hosted SigNoz๋ฅผ ์‚ฌ์šฉํ•ด ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ €ํฌ๊ฐ€ ์ƒ๊ฐํ•˜๋Š” ๊ธฐ๋Šฅ์„ ์ „๋ถ€ ๋งŒ์กฑํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.

OpenTelemetry๋ž€

OpenTelemetry๋Š” ์ฝ”๋“œ๋ฅผ ๊ณ„์ธกํ•˜๊ณ  telemetry data๋ฅผ observability backend๋กœ ๋‚ด๋ณด๋‚ด๋Š” ํ‘œ์ค€์ด ์—†๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด OpenTracing๊ณผ OpenCensus๋ผ๋Š” ํ”„๋กœ์ ํŠธ๊ฐ€ ํ•ฉ์ณ์ ธ ๋งŒ๋“ค์–ด์ง„ CNCF ํ”„๋กœ์ ํŠธ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ OpenTelemetry๋ฅผ ํ†ตํ•ด ๊ณ„์ธก๋˜๋Š” traces, metrics, logs ๋“ฑ telemetry data ๋Š” ํ‘œ์ค€ํ™”๋œ data ํ”„๋กœํ† ์ฝœ์ธ OTLP๋กœ export ๋˜๋ฉฐ ๋ฒค๋”์˜ ์ข…์†์„ฑ์„ ๊ฐ€์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค. Observability backend๋กœ Prometheus, Grafana, Jaeger, Datadog, SigNoz ๋“ฑ ๋‹ค์–‘ํ•œ ์†”๋ฃจ์…˜์„ ์„ ํƒํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ถ”ํ›„ ํ•„์š”์„ฑ์ด ์ข€๋” ์ปค์ง€๊ฒŒ ๋˜์—ˆ์„๋•Œ SigNoz๋งŒ ๊ต์ฒดํ•ด ๋‹ค๋ฅธ ๋ฐฑ์—”๋“œ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ๋„ ์ €ํฌ์—๊ฒ ํฐ ๋งค๋ ฅ์œผ๋กœ ๋‹ค๊ฐ€์™”์Šต๋‹ˆ๋‹ค.

01. OpenTelemetry Architecture

OpenTelemetry๋Š” Agent, API, SDK ๋“ฑ์˜ ํ˜•ํƒœ๋กœ WAS์˜ telemetry data๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ด ๋ฐ์ดํ„ฐ๋ฅผ Collector๋กœ ์ „์†กํ•ด Collector๊ฐ€ ๋ฐ์ดํ„ฐ ํ”„๋กœ์„ธ์‹ฑ ํ›„ ์™ธ๋ถ€ ๋ฐฑ์—”๋“œ๋กœ ์ „์†กํ•˜๋Š” ์•„ํ‚คํ…์ณ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

02. OpenTelemetry Collector

์ด๋•Œ OpenTelemetry Collector๋ฅผ ์ข€๋” ์ƒ์„ธํžˆ ์‚ดํŽด๋ณด๋ฉด Receivers, Processors, Exporters ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ๊ฐ์˜ ๊ตฌ์„ฑ์š”์†Œ๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

Receivers

  • OTLP, Jaeger, Zipkin ๋“ฑ push-based protocol ์ง€์›ํ•ด application์ด collector๋กœ ๋ฐ์ดํ„ฐ ์ „์†ก
    • ์ด๋•Œ OTLP๋Š” HTTP, gRPC ๋ชจ๋‘ ์‚ฌ์šฉ๊ฐ€๋Šฅ
  • Prometheus ๋“ฑ pull-based protocol ์ง€์›ํ•ด collector๊ฐ€ ์ง์ ‘ ๋ฐ์ดํ„ฐ ์Šคํฌ๋ž˜ํ•‘ ๊ฐ€๋Šฅ

Processors

  • Batch : ํšจ์œจ์ ์ธ ์ „์†ก์„ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌถ์Œ
  • Filter : ๊ด€๋ จ ์—†๋Š” ๋ฐ์ดํ„ฐ ์ œ๊ฑฐ
  • Normalization : ๋ฐ์ดํ„ฐ ํ˜•์‹ ๋ณ€ํ™˜, ํ‘œ์ค€ํ™”
  • Attributes : ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์ฃผ์„ ์ฒ˜๋ฆฌ
  • Sampling : ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•œ ์–‘ ์กฐ์ ˆ
  • Memory Limiter : ๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ ์ œํ•œ ๋ฐ ๊ณผ๋ถ€ํ•˜ ๋ฐฉ์ง€

Exporters

  • ๋ฐฑ์—”๋“œ ์‹œ์Šคํ…œ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ „์†ก
  • ์ „์†ก ์‹คํŒจ ์‹œ ์žฌ์‹œ๋„ ๋กœ์ง ์ œ๊ณต
  • ์—ฌ๋Ÿฌ ๋ฐฑ์—”๋“œ๊ฐ„ ๋กœ๋“œ๋ฐธ๋Ÿฐ์‹ฑ

๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Collector์˜ ํ—ฌ์Šค์ฒดํฌ, ์ธ์ฆ ๋ฐ ๊ถŒํ•œ๊ด€๋ฆฌ ๋“ฑ ๊ธฐ๋Šฅ์„ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋Š” extension๋„ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

Collector์˜ receivers, processors, exporters ์„ค์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ OpenTelemetry Collector YAML ์„ ํ†ตํ•ด ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch:

exporters:
  otlp:
    endpoint: otelcol:4317

extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

nest.js์—์„œ OpenTelemetry client ์‚ฌ์šฉ

์ €ํฌ๊ฐ€ ์„ ํƒํ•œ SigNoz์—์„œ๋Š” telemetry data๋ฅผ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ๋Š” frontend, ์ €์žฅํ•˜๋Š” time series database์ธ ClickHouse, OpenTelemetry collector ๊นŒ์ง€ ZooKeeper๋กœ ๊ด€๋ฆฌํ•ด์ค˜ ๋ฐฑ์—”๋“œ์— ์‹ ๊ฒฝ์„ ๋งŽ์ด ์“ฐ์ง€ ์•Š๊ณ ๋„ OpenTelemetry client ์„ค์ •๋งŒ์œผ๋กœ ์ดˆ๊ธฐ ๋„์ž…์„ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ nest.js application์—์„œ OpenTelemetry client๋ฅผ ์–ด๋–ป๊ฒŒ ์„ค์ •ํ•˜๊ณ  ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Automatic Instrumentation์„ ํ†ตํ•œ OpenTelemetry ์„ค์ •

๋จผ์ € OpenTelemetry์—์„œ ์ œ๊ณตํ•˜๋Š” auto instrumentation์„ ํ†ตํ•ด ์ตœ์†Œํ•œ์˜ ์ฝ”๋“œ๋กœ ๊ธฐ๋ณธ์ ์ธ metrics, traces, logs ๋ฅผ ์ˆ˜์ง‘ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

1. OpenTelemetry packages ์„ค์น˜

npm install --save @OpenTelemetry/api 
npm install --save @OpenTelemetry/sdk-node
npm install --save @OpenTelemetry/auto-instrumentations-node
npm install --save @OpenTelemetry/exporter-trace-otlp-http

2. tracer.ts ํŒŒ์ผ ์„ค์ •

import { getNodeAutoInstrumentations } from '@OpenTelemetry/auto-instrumentations-node';
import { Resource } from '@OpenTelemetry/resources';
import * as OpenTelemetry from '@OpenTelemetry/sdk-node';
import { SemanticResourceAttributes } from '@OpenTelemetry/semantic-conventions';
import dotenv from 'dotenv';

dotenv.config();

// For Debug
// import { diag, DiagConsoleLogger, DiagLogLevel } from '@OpenTelemetry/api';
// diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.ALL);

const exporterOptions = { 
    url: 'OTEL Collector URL', 
    };

const traceExporter = new OTLPTraceExporter(exporterOptions);

const sdk = new OpenTelemetry.NodeSDK({
    traceExporter,
    instrumentations: [
        getNodeAutoInstrumentations({
            '@OpenTelemetry/instrumentation-fs': {
                enabled: false
            }
        })
    ],
    resource: new Resource({
        [SemanticResourceAttributes.SERVICE_NAME]: 'Service Name'
        [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.stage
    })
});

process.on('SIGTERM', () => {
    sdk.shutdown()
        .then(() => console.log('Tracing terminated'))
        .catch(error => console.log('Error terminating tracing', error))
        .finally(() => process.exit(0));
});

export default sdk;

3. main.ts ์—์„œ tracer ์‹œ์ž‘

// eslint-disable-next-line import/order
import tracer from './tracer';

import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';

async function bootstrap() {
    // tracer๋ฅผ ๊ฐ€์žฅ ๋จผ์ € ์‹œ์ž‘
    await tracer.start();
    
    const app = await NestFactory.create(AppModule);
    
    // nest.js app config
    // ...
    
    await app.listen(3000);
}

bootstrap();

์ด๋•Œ tracer๋Š” ๋ฐ˜๋“œ์‹œ application์˜ ๋ฉ”์ธ ํŒŒ์ผ (ex. main.ts)์˜ ์ตœ์ƒ๋‹จ์— import ์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ lint์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋„๋ก import/order๋ฅผ ํ•ด๋‹น๋ผ์ธ์€ ๋ฌด์‹œํ•˜๋„๋ก eslint-disable-next-line import/order ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

Custom Metrics ์ˆ˜์ง‘

๋‹ค์Œ์œผ๋กœ๋Š” OpenTelemetry๋ฅผ ํ†ตํ•ด Custom Metrics๋ฅผ ์ˆ˜์ง‘ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

1. MetricsService ์ƒ์„ฑ

import { Injectable } from '@nestjs/common';
import { metrics, Meter } from '@OpenTelemetry/api';

@Injectable()
export class MetricsService {
    private meter: Meter;
    private requestCounter;
    private requestDurationHistogram;
    private activeUsersGauge;

    constructor() {
        this.meter = metrics.getMeter('nestjs-application');

        // Counter Metric ์˜ˆ์‹œ
        this.requestCounter = this.meter.createCounter('app.request.count', {
            description: 'API Count'
        });

        // Histogram Metric ์˜ˆ์‹œ
        this.requestDurationHistogram = this.meter.createHistogram('app.request.duration', {
            description: 'latency (ms)',
            unit: 'ms'
        });

        // Gague Metric ์˜ˆ์‹œ
        this.activeUsersGauge = this.meter.createObservableGauge('app.active_users', {
            description: 'Active Users Gauge'
        });

        // active user ๋ฉ”ํŠธ๋ฆญ ๋“ฑ๋ก
        let activeUsers = 0;
        this.meter.createObservableGauge('app.active_users', {
            description: 'Active Users Gauge',
            callback: observableResult => {
                observableResult.observe(activeUsers);
            }
        });

        // active user ์—…๋ฐ์ดํŠธ
        this.updateActiveUsers = (count: number) => {
            activeUsers = count;
        };
    }

    // API Count ์ฆ๊ฐ€
    incrementRequestCount(endpoint: string, method: string, statusCode: number) {
        this.requestCounter.add(1, {
            endpoint,
            method,
            statusCode: statusCode.toString()
        });
    }

    // API Latency ๊ธฐ๋ก
    recordRequestDuration(endpoint: string, method: string, durationMs: number) {
        this.requestDurationHistogram.record(durationMs, {
            endpoint,
            method
        });
    }

    // ํ™œ์„ฑ ์‚ฌ์šฉ์ž ์ˆ˜ ์—…๋ฐ์ดํŠธ
    updateActiveUsers: (count: number) => void;
}

2. MetricsMiddleware ์ƒ์„ฑ

import { MetricsService } from './metrics.service';

export class MetricsMiddleware {
    constructor(private metricsService: MetricsService) {}

    use(req, res, next) {
        const startTime = Date.now();
        const endpoint = req.path;
        const method = req.method;

        // ๊ธฐ์กด ์‘๋‹ต end ๋ฉ”์„œ๋“œ ์ €์žฅ
        const originalEnd = res.end;

        // ์‘๋‹ต end ๋ฉ”์„œ๋“œ ์˜ค๋ฒ„๋ผ์ด๋“œ
        res.end = function (...args) {
            const duration = Date.now() - startTime;
            const statusCode = res.statusCode;

            // ๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋ก
            this.metricsService.incrementRequestCount(endpoint, method, statusCode);
            this.metricsService.recordRequestDuration(endpoint, method, duration);

            // ์›๋ž˜ end ๋ฉ”์„œ๋“œ ํ˜ธ์ถœ
            return originalEnd.apply(res, args);
        }.bind({ metricsService: this.metricsService });

        next();
    }
}

3. app.module์— service ๋ฐ middleware ๋“ฑ๋ก

import { Module } from '@nestjs/common';
import { MetricsService } from './metrics.service';
import { MetricsMiddleware } from './metrics.middleware'
import { AppController } from './app.controller';

@Module({
    imports: [],
    controllers: [AppController],
    providers: [MetricsService],
})

export class AppModule implements NestModule {
    configure(consumer: MiddlewareConsumer) { consumer.apply(MetricsMiddleware).forRoutes('*'); }
}

์œ„์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋“  API call์— ๋Œ€ํ•ด API Count, API Latency, Active User Count ๋ฅผ ์ธก์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ MetricsService ์— ๋”ฐ๋กœ ์ธก์ •ํ•  custom metric์„ ์ถ”๊ฐ€ํ•˜๊ณ  controller ํ˜น์€ service ๋ ˆ๋ฒจ์—์„œ ํ•ด๋‹น metric์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํŠน์ • API์—๋งŒ ๋”ฐ๋กœ metric์„ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋งˆ์น˜๋ฉฐ

์ด๋ฒˆ ๊ธ€์—์„œ๋Š” fromm ๋ฐฑ์—”๋“œ ํŒ€์—์„œ observability ํ™•๋ณด๋ฅผ ์œ„ํ•ด CloudWatch๋ฅผ ๋ฒ—์–ด๋‚˜ OpenTelemetry๋ฅผ ๋„์ž…ํ•˜๊ธฐ๊นŒ์ง€์˜ ์—ฌ์ •์„ ํ•จ๊ป˜ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ๊ธ€์—์„œ๋Š” ECS, K8S ํ™˜๊ฒฝ ๋“ฑ ์ผ๋ฐ˜์ ์ธ container ํ™˜๊ฒฝ๊ณผ๋Š” ์กฐ๊ธˆ ๋‹ค๋ฅธ ์„ค์ •์„ ํ•ด์ฃผ์–ด์•ผ ํ•˜๋Š” Lambda์—์„œ OpenTelemetry๋ฅผ ๋„์ž…ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๋๊นŒ์ง€ ์ฝ์–ด์ฃผ์‹  ๋ชจ๋“ ๋ถ„๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค๐Ÿ™‡โ€โ™‚๏ธ

์ฐธ๊ณ ์ž๋ฃŒ

โ† ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

Art Changes Life

๋…ธ๋จธ์Šค์™€ ํ•จ๊ป˜ ์—”ํ„ฐํ…Œํฌ ์‚ฐ์—…์„ ํ˜์‹ ํ•ด๋‚˜๊ฐˆ ๋ฉค๋ฒ„๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค.

์ฑ„์šฉ ์ค‘์ธ ๊ณต๊ณ  ๋ณด๊ธฐ