Observability¶

Naftiko Skipper provides built-in observability for every capability that declares a type: control expose. No code changes are required — the operator wires everything automatically.

The Control Port¶

Add a type: control expose to your capability spec to activate observability:

capability:
  exposes:
    - type: rest
      port: 3001
      namespace: my-api
      # ...

    - type: control
      address: "0.0.0.0"
      port: 9090
      observability:
        enabled: true
        metrics:
          local:
            enabled: true
        traces:
          sampling: 1.0          # 1.0 = 100%, 0.1 = 10%
          propagation: w3c

The control port exposes:

Endpoint	Description
`/metrics`	Prometheus text format — RED metrics
`/health/live`	Liveness probe
`/health/ready`	Readiness probe
`/status`	Runtime capability status
`/traces`	Recent trace ring buffer (local dev)

What the Operator Does Automatically¶

When a type: control expose is present, Skipper:

Adds a named control port to the Service and Deployment container spec

Writes Prometheus pod annotations on the pod template:

prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"

Creates a ServiceMonitor for Prometheus Operator (if CRDs are installed)
Injects OTEL env vars from the operator's own configuration

Available Metrics¶

The ikanos engine exports RED metrics via the Prometheus scrape endpoint:

Metric	Type	Labels
`ikanos_capability_active`	Gauge	`ikanos_capability`
`ikanos_request_total`	Counter	`ikanos_adapter_type`, `ikanos_operation_id`, `status`
`ikanos_request_duration_seconds`	Histogram	`ikanos_adapter_type`, `ikanos_operation_id`, `status`
`ikanos_request_errors`	Counter	`ikanos_adapter_type`, `ikanos_operation_id`, `error.type`
`ikanos_step_duration_seconds`	Histogram	`step_type`, `naftiko_namespace`
`ikanos_http_client_total`	Counter	`server_address`, `http_response_status_code`
`ikanos_http_client_duration_seconds`	Histogram	`server_address`

Prometheus & Grafana Setup¶

Install kube-prometheus-stack¶

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set kubeControllerManager.enabled=false \
  --set kubeScheduler.enabled=false \
  --set kubeProxy.enabled=false \
  --set kubeEtcd.enabled=false \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false

RBAC for ServiceMonitor¶

The operator needs permission to manage ServiceMonitor resources:

kubectl patch clusterrole naftiko-skipper --type=json -p='[
  {
    "op": "add",
    "path": "/rules/-",
    "value": {
      "apiGroups": ["monitoring.coreos.com"],
      "resources": ["servicemonitors"],
      "verbs": ["get","list","watch","create","update","patch","delete"]
    }
  }
]'

This is included in the Helm chart rbac.yaml — only needed if you applied the operator manually.

Access Prometheus and Grafana¶

# Expose via NodePort
kubectl patch svc kube-prometheus-stack-prometheus -n monitoring \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"add","path":"/spec/ports/0/nodePort","value":30090}]'

kubectl patch svc kube-prometheus-stack-grafana -n monitoring \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"add","path":"/spec/ports/0/nodePort","value":30030}]'

# Get Grafana password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath="{.data.admin-password}" | base64 -d ; echo

Prometheus: http://<node-ip>:30090
Grafana: http://<node-ip>:30030 (admin / password above)

Import the Naftiko Dashboard¶

GRAFANA_PASSWORD=$(kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath="{.data.admin-password}" | base64 -d)

curl -s -X POST http://<node-ip>:30030/api/dashboards/import \
  -H "Content-Type: application/json" \
  -u "admin:${GRAFANA_PASSWORD}" \
  -d "{
    \"dashboard\": $(cat config/observability/dashboards/grafana-naftiko.json),
    \"overwrite\": true,
    \"inputs\": [{
      \"name\": \"DS_PROMETHEUS\",
      \"type\": \"datasource\",
      \"pluginId\": \"prometheus\",
      \"value\": \"Prometheus\"
    }],
    \"folderId\": 0
  }"

The dashboard shows: Active Capabilities, Request Rate, Error Rate, P99 Latency, Request Duration (p50/p95/p99), Step Duration, HTTP Client metrics.

Verify Prometheus is Scraping¶

Check that the ServiceMonitor was created:

kubectl get servicemonitor <capability-name> -n default

Check the Prometheus target is UP:

curl -s "http://<node-ip>:30090/api/v1/targets" | \
  python3 -c "
import sys, json
for t in json.load(sys.stdin)['data']['activeTargets']:
    if '<capability-name>' in str(t['labels']):
        print(t['labels'].get('job'), '->', t['health'])
"

Datadog Integration¶

Install the Datadog Agent with OTLP receiver¶

kubectl create secret generic datadog-secret \
  --from-literal=api-key=<YOUR_DD_API_KEY> \
  -n monitoring

cat > datadog-values.yaml << 'EOF'
datadog:
  apiKeyExistingSecret: datadog-secret
  site: datadoghq.eu         # change to datadoghq.com for US
  clusterName: my-cluster
  kubelet:
    tlsVerify: false
  otlp:
    receiver:
      protocols:
        http:
          enabled: true
        grpc:
          enabled: true
  apm:
    portEnabled: true
  systemProbe:
    enabled: false
  processAgent:
    enabled: false
  env:
    - name: DD_HOSTNAME
      value: my-cluster-node

agents:
  useHostNetwork: false
EOF

helm repo add datadog https://helm.datadoghq.com
helm install datadog-agent datadog/datadog \
  --namespace monitoring -f datadog-values.yaml

Configure the Operator to Push to Datadog¶

Set these env vars on the Naftiko Skipper operator Deployment:

kubectl set env deployment/naftiko-skipper \
  NAFTIKO_OTEL_ENDPOINT=http://datadog-agent.monitoring.svc.cluster.local:4318 \
  NAFTIKO_OTEL_PROTOCOL=http/protobuf \
  NAFTIKO_OTEL_SAMPLING_RATE=1.0 \
  -n naftiko-system

Every capability pod will automatically receive: - OTEL_SERVICE_NAME=naftiko-{capability-name} - OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent.monitoring.svc.cluster.local:4318 - OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf - OTEL_TRACES_SAMPLER_ARG=1.0

Force Reconcile¶

After updating the operator env vars, reconcile your capabilities to pick up the new OTEL configuration:

kubectl annotate capability <name> \
  reconcile-at=$(date +%s) --overwrite -n default

View in Datadog APM¶

Navigate to APM → Services in the Datadog UI. Your capability will appear as ikanos-{capability-label} with request rate, latency percentiles, and error rate.

Operator-Level OTEL Configuration¶

All OTEL env vars are configured once on the operator — every capability in the cluster inherits them automatically:

Env var	Injected as	Default
`NAFTIKO_OTEL_ENDPOINT`	`OTEL_EXPORTER_OTLP_ENDPOINT`	not set
`NAFTIKO_OTEL_PROTOCOL`	`OTEL_EXPORTER_OTLP_PROTOCOL`	not set
`NAFTIKO_OTEL_HEADERS`	`OTEL_EXPORTER_OTLP_HEADERS`	not set
`NAFTIKO_OTEL_SAMPLING_RATE`	`OTEL_TRACES_SAMPLER_ARG`	not set

OTEL_SERVICE_NAME is always set to naftiko-{capability-name} — no configuration needed.

Quick Reference¶

# Check control port metrics directly
kubectl port-forward svc/<capability> 9090:9090 -n default &
curl http://localhost:9090/metrics | grep ikanos

# Check ServiceMonitor
kubectl get servicemonitor <capability> -n default

# Force reconcile after spec change
kubectl annotate capability <capability> \
  reconcile-at=$(date +%s) --overwrite -n default

# Check Prometheus has the target UP
kubectl port-forward -n monitoring \
  svc/kube-prometheus-stack-prometheus 9091:9090 &
# then open http://localhost:9091/targets