Tempo · 分布式链路追踪

Tempo 是 Grafana 的分布式链路追踪（distributed tracing）后端，负责 trace 的接收、存储与查询。它跑在 tempo 命名空间，单副本 Deployment（grafana/tempo:3.0.2），调度到带 workload: infra 标签的节点上。

本页事实全部来自 applications/observability/tempo/ 下的 manifest（deployment、configmap、service、ingress、servicemonitor、prometheusrule）。Tempo 当前版本 3.0.2，从 2.10.7 升上来是一次架构级重构 —— 见下文 3.0 架构级重构。

Tempo 做什么

Tempo 在可观测性三大支柱（logs / metrics / traces）里负责 traces。它接收来自应用的 trace（一条 trace 由若干 span 组成，描述一次请求在各服务间的完整调用链），按 trace 落盘存储，并对外提供按 trace ID 查询的能力 —— Grafana 把它当数据源接进去，在界面上画出火焰图和服务依赖。

trace 与另外两支柱并非孤立：Tempo 内置的 metrics-generator 会从流入的 span 实时派生出 RED 指标和服务拓扑，再远程写给 Prometheus，于是 traces 反过来喂养了 metrics。详见 metrics-generator。

接收端口

distributor.receivers 同时开了 OTLP 和 Jaeger 两套接收协议，应用按需选其一上报。Service（ClusterIP tempo）把这些端口原样暴露在集群内：

Name
OTLP gRPC :4317
Description
OpenTelemetry 原生协议，gRPC 传输。新接入的服务首选。
Name
OTLP HTTP :4318
Description
OpenTelemetry 原生协议，HTTP 传输。
Name
Jaeger gRPC :14250
Description
兼容 Jaeger 客户端的 gRPC 上报端口。
Name
Jaeger thrift_http :14268
Description
兼容 Jaeger 客户端的 thrift over HTTP 上报端口。
Name
HTTP API :3200
Description
Tempo 自身的查询 / 健康检查 / metrics 端口（server.http_listen_port）。Grafana 数据源和 Ingress 都连这个。
Name
gRPC :9095
Description
Tempo 内部 gRPC server（server.grpc_listen_port）。

应用命名空间 platform / app / game（以及 argo-workflows）通过 NetworkPolicy 被显式放行，可以把 trace 推进来。

存储与保留

后端用的是 backend: local —— trace 块直接写本地盘，不走对象存储。底层是一个 50Gi 的 local-path PVC（tempo-data，ReadWriteOnce），挂在 /var/tempo。块格式为 vParquet4。保留期 30 天（720h）。

storage:
  trace:
    backend: local
    wal:
      path: /var/tempo/wal
    local:
      path: /var/tempo/blocks

因为容器以非 root（uid/gid 10001）运行，Deployment 用一个 busybox initContainer 先 chown -R 10001:10001 /var/tempo，再交给主容器。

保留期 720h 的落点很关键：在 3.0 里实际执行块删除的是 backend_worker.compaction.block_retention，它的默认值是 336h（14 天）—— 不显式设置就会在 14 天悄悄裁掉块。配置里把 backend_scheduler 和 backend_worker 两处都钉成 720h，前者只参与 compaction 体积估算、不负责删除。

metrics-generator

metrics-generator 从流入的 span 派生两类指标，并远程写入 Prometheus：

span-metrics —— 每个 span 的 RED 指标（请求数 / 错误数 / 时延），如 traces_spanmetrics_calls_total。
service-graphs —— 服务间调用关系，用来画服务依赖拓扑。

overrides.defaults.metrics_generator.processors 启用了 [service-graphs, span-metrics]，registry 打上 source: tempo 外部标签。远程写目标是集群里的 Prometheus（send_exemplars: true，让 metrics 能跳转回对应 trace）：

remote_write:
  - url: http://kube-prometheus-prometheus.prometheus.svc.cluster.local:9090/api/v1/write
    send_exemplars: true

这条远程写路径是 Tempo 上一个隐蔽故障点，已加了告警兜底 —— 见监控告警。

3.0 架构级重构

从 2.10.7 升到 3.0.2 不是普通的 tag bump，而是 Tempo 内部组件模型的重构，配置必须跟着改，否则行为会悄悄变样：

维度	2.x	3.0
写入路径	独立 `ingester` 组件	移除，改为 `live_store`
compaction / 保留	独立 `compactor` 组件	移除，改为进程内的 `backend_scheduler` + `backend_worker`
块保留期配置位	`compactor.block_retention`	`backend_worker.compaction.block_retention`（真正执行删除的字段）
`max_block_duration` 默认值	5m	降到 30s

配置里相应做了三处适配：① live_store.wal.path 指到 /var/tempo/live-store/traces；② live_store.max_block_duration 显式钉回 5m（抵消默认降到 30s）；③ 保留期搬到 backend_worker.compaction.block_retention: 720h（并同步 backend_scheduler）。

部署形态仍是 target 全功能单体（distributor / live_store / querier / generator / scheduler / worker 跑在一个进程里），单副本，没有拆成微服务模式。

部署与依赖

Name
镜像
Description
grafana/tempo:3.0.2，由 argocd-image-updater 管理 tag（不要手动钉版本）。
Name
副本 / 调度
Description
replicas: 1，RollingUpdate，nodeSelector workload: infra。
Name
资源
Description
requests 300m CPU / 1Gi 内存，limits 1000m CPU / 2Gi 内存。命名空间另有 LimitRange 默认值和 ResourceQuota 上限。
Name
存储
Description
PVC tempo-data，50Gi，storageClass local-path，ReadWriteOnce，挂在 /var/tempo。
Name
对外入口
Description
Ingress tempo.yldm.tech（ingressClass traefik），cert-manager 签 letsencrypt-prod 证书，经 Cloudflare Tunnel 暴露，后端指向 Service 的 :3200。
Name
远程写依赖
Description
把 span-metrics / service-graph 远程写到 prometheus 命名空间的 Prometheus :9090（NetworkPolicy allow-prometheus-egress 放行该出向）。
Name
无 Secret
Description
该目录下没有 ExternalSecret —— 本地存储 + 集群内远程写，不需要凭据。

Tempo 是集群可观测性的一环，与 Prometheus（metrics）、Grafana（统一查询界面）配合使用。

监控告警

ServiceMonitor tempo 抓 :3200 的 /metrics（30s 间隔，10s 超时）。PrometheusRule tempo 里有两条告警，专门盯 metrics-generator 的远程写路径：

TempoMetricsGeneratorRemoteWriteStalled —— 在 3.0.2 上，generator 每次启动会清空并重建它的 WAL 目录，remote_write watcher 可能在第一个 segment 落盘前就锁定了空目录，从此 watcher 永远停摆：样本一直在 append（prometheus_agent_samples_appended_total 速率 > 0），但远程写出去的样本数（prometheus_remote_storage_samples_total）恒为 0。trace 本身不受影响，kubectl get pod 也看不出异常，只有这两个指标的背离能暴露它。持续 15m 触发。
TempoSpanMetricsAbsent —— 从消费端兜底：Prometheus 里 traces_spanmetrics_calls_total 序列完全消失，持续 30m 触发。

两条告警的恢复手段都是重启 tempo pod（kubectl -n tempo delete pod <pod>）—— 重建后的 WAL 在 remote_write 初始化时已经有了 segment，watcher 就能正常追上。