Skip to content

Observability

Naftiko Skipper provides built-in observability for every capability that declares a type: control expose. No code changes are required — the operator wires everything automatically.


The Control Port

Add a type: control expose to your capability spec to activate observability:

capability:
  exposes:
    - type: rest
      port: 3001
      namespace: my-api
      # ...

    - type: control
      address: "0.0.0.0"
      port: 9090
      observability:
        enabled: true
        metrics:
          local:
            enabled: true
        traces:
          sampling: 1.0          # 1.0 = 100%, 0.1 = 10%
          propagation: w3c

The control port exposes:

Endpoint Description
/metrics Prometheus text format — RED metrics
/health/live Liveness probe
/health/ready Readiness probe
/status Runtime capability status
/traces Recent trace ring buffer (local dev)

What the Operator Does Automatically

When a type: control expose is present, Skipper:

  1. Adds a named control port to the Service and Deployment container spec
  2. Writes Prometheus pod annotations on the pod template:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"
    
  3. Creates a ServiceMonitor for Prometheus Operator (if CRDs are installed)
  4. Injects OTEL env vars from the operator's own configuration

Available Metrics

The ikanos engine exports RED metrics via the Prometheus scrape endpoint:

Metric Type Labels
ikanos_capability_active Gauge ikanos_capability
ikanos_request_total Counter ikanos_adapter_type, ikanos_operation_id, status
ikanos_request_duration_seconds Histogram ikanos_adapter_type, ikanos_operation_id, status
ikanos_request_errors Counter ikanos_adapter_type, ikanos_operation_id, error.type
ikanos_step_duration_seconds Histogram step_type, naftiko_namespace
ikanos_http_client_total Counter server_address, http_response_status_code
ikanos_http_client_duration_seconds Histogram server_address

Prometheus & Grafana Setup

Install kube-prometheus-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set kubeControllerManager.enabled=false \
  --set kubeScheduler.enabled=false \
  --set kubeProxy.enabled=false \
  --set kubeEtcd.enabled=false \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false

RBAC for ServiceMonitor

The operator needs permission to manage ServiceMonitor resources:

kubectl patch clusterrole naftiko-skipper --type=json -p='[
  {
    "op": "add",
    "path": "/rules/-",
    "value": {
      "apiGroups": ["monitoring.coreos.com"],
      "resources": ["servicemonitors"],
      "verbs": ["get","list","watch","create","update","patch","delete"]
    }
  }
]'

This is included in the Helm chart rbac.yaml — only needed if you applied the operator manually.

Access Prometheus and Grafana

# Expose via NodePort
kubectl patch svc kube-prometheus-stack-prometheus -n monitoring \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"add","path":"/spec/ports/0/nodePort","value":30090}]'

kubectl patch svc kube-prometheus-stack-grafana -n monitoring \
  --type='json' \
  -p='[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"add","path":"/spec/ports/0/nodePort","value":30030}]'

# Get Grafana password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath="{.data.admin-password}" | base64 -d ; echo
  • Prometheus: http://<node-ip>:30090
  • Grafana: http://<node-ip>:30030 (admin / password above)

Import the Naftiko Dashboard

GRAFANA_PASSWORD=$(kubectl get secret -n monitoring kube-prometheus-stack-grafana \
  -o jsonpath="{.data.admin-password}" | base64 -d)

curl -s -X POST http://<node-ip>:30030/api/dashboards/import \
  -H "Content-Type: application/json" \
  -u "admin:${GRAFANA_PASSWORD}" \
  -d "{
    \"dashboard\": $(cat config/observability/dashboards/grafana-naftiko.json),
    \"overwrite\": true,
    \"inputs\": [{
      \"name\": \"DS_PROMETHEUS\",
      \"type\": \"datasource\",
      \"pluginId\": \"prometheus\",
      \"value\": \"Prometheus\"
    }],
    \"folderId\": 0
  }"

The dashboard shows: Active Capabilities, Request Rate, Error Rate, P99 Latency, Request Duration (p50/p95/p99), Step Duration, HTTP Client metrics.


Verify Prometheus is Scraping

Check that the ServiceMonitor was created:

kubectl get servicemonitor <capability-name> -n default

Check the Prometheus target is UP:

curl -s "http://<node-ip>:30090/api/v1/targets" | \
  python3 -c "
import sys, json
for t in json.load(sys.stdin)['data']['activeTargets']:
    if '<capability-name>' in str(t['labels']):
        print(t['labels'].get('job'), '->', t['health'])
"

Datadog Integration

Install the Datadog Agent with OTLP receiver

kubectl create secret generic datadog-secret \
  --from-literal=api-key=<YOUR_DD_API_KEY> \
  -n monitoring

cat > datadog-values.yaml << 'EOF'
datadog:
  apiKeyExistingSecret: datadog-secret
  site: datadoghq.eu         # change to datadoghq.com for US
  clusterName: my-cluster
  kubelet:
    tlsVerify: false
  otlp:
    receiver:
      protocols:
        http:
          enabled: true
        grpc:
          enabled: true
  apm:
    portEnabled: true
  systemProbe:
    enabled: false
  processAgent:
    enabled: false
  env:
    - name: DD_HOSTNAME
      value: my-cluster-node

agents:
  useHostNetwork: false
EOF

helm repo add datadog https://helm.datadoghq.com
helm install datadog-agent datadog/datadog \
  --namespace monitoring -f datadog-values.yaml

Configure the Operator to Push to Datadog

Set these env vars on the Naftiko Skipper operator Deployment:

kubectl set env deployment/naftiko-skipper \
  NAFTIKO_OTEL_ENDPOINT=http://datadog-agent.monitoring.svc.cluster.local:4318 \
  NAFTIKO_OTEL_PROTOCOL=http/protobuf \
  NAFTIKO_OTEL_SAMPLING_RATE=1.0 \
  -n naftiko-system

Every capability pod will automatically receive: - OTEL_SERVICE_NAME=naftiko-{capability-name} - OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent.monitoring.svc.cluster.local:4318 - OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf - OTEL_TRACES_SAMPLER_ARG=1.0

Force Reconcile

After updating the operator env vars, reconcile your capabilities to pick up the new OTEL configuration:

kubectl annotate capability <name> \
  reconcile-at=$(date +%s) --overwrite -n default

View in Datadog APM

Navigate to APM → Services in the Datadog UI. Your capability will appear as ikanos-{capability-label} with request rate, latency percentiles, and error rate.


Operator-Level OTEL Configuration

All OTEL env vars are configured once on the operator — every capability in the cluster inherits them automatically:

Env var Injected as Default
NAFTIKO_OTEL_ENDPOINT OTEL_EXPORTER_OTLP_ENDPOINT not set
NAFTIKO_OTEL_PROTOCOL OTEL_EXPORTER_OTLP_PROTOCOL not set
NAFTIKO_OTEL_HEADERS OTEL_EXPORTER_OTLP_HEADERS not set
NAFTIKO_OTEL_SAMPLING_RATE OTEL_TRACES_SAMPLER_ARG not set

OTEL_SERVICE_NAME is always set to naftiko-{capability-name} — no configuration needed.


Quick Reference

# Check control port metrics directly
kubectl port-forward svc/<capability> 9090:9090 -n default &
curl http://localhost:9090/metrics | grep ikanos

# Check ServiceMonitor
kubectl get servicemonitor <capability> -n default

# Force reconcile after spec change
kubectl annotate capability <capability> \
  reconcile-at=$(date +%s) --overwrite -n default

# Check Prometheus has the target UP
kubectl port-forward -n monitoring \
  svc/kube-prometheus-stack-prometheus 9091:9090 &
# then open http://localhost:9091/targets