Observability¶
Naftiko Skipper provides built-in observability for every capability that
declares a type: control expose. No code changes are required — the operator
wires everything automatically.
The Control Port¶
Add a type: control expose to your capability spec to activate observability:
capability:
exposes:
- type: rest
port: 3001
namespace: my-api
# ...
- type: control
address: "0.0.0.0"
port: 9090
observability:
enabled: true
metrics:
local:
enabled: true
traces:
sampling: 1.0 # 1.0 = 100%, 0.1 = 10%
propagation: w3c
The control port exposes:
| Endpoint | Description |
|---|---|
/metrics |
Prometheus text format — RED metrics |
/health/live |
Liveness probe |
/health/ready |
Readiness probe |
/status |
Runtime capability status |
/traces |
Recent trace ring buffer (local dev) |
What the Operator Does Automatically¶
When a type: control expose is present, Skipper:
- Adds a named
controlport to the Service and Deployment container spec - Writes Prometheus pod annotations on the pod template:
- Creates a
ServiceMonitorfor Prometheus Operator (if CRDs are installed) - Injects OTEL env vars from the operator's own configuration
Available Metrics¶
The ikanos engine exports RED metrics via the Prometheus scrape endpoint:
| Metric | Type | Labels |
|---|---|---|
ikanos_capability_active |
Gauge | ikanos_capability |
ikanos_request_total |
Counter | ikanos_adapter_type, ikanos_operation_id, status |
ikanos_request_duration_seconds |
Histogram | ikanos_adapter_type, ikanos_operation_id, status |
ikanos_request_errors |
Counter | ikanos_adapter_type, ikanos_operation_id, error.type |
ikanos_step_duration_seconds |
Histogram | step_type, naftiko_namespace |
ikanos_http_client_total |
Counter | server_address, http_response_status_code |
ikanos_http_client_duration_seconds |
Histogram | server_address |
Prometheus & Grafana Setup¶
Install kube-prometheus-stack¶
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set kubeControllerManager.enabled=false \
--set kubeScheduler.enabled=false \
--set kubeProxy.enabled=false \
--set kubeEtcd.enabled=false \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
RBAC for ServiceMonitor¶
The operator needs permission to manage ServiceMonitor resources:
kubectl patch clusterrole naftiko-skipper --type=json -p='[
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["monitoring.coreos.com"],
"resources": ["servicemonitors"],
"verbs": ["get","list","watch","create","update","patch","delete"]
}
}
]'
This is included in the Helm chart rbac.yaml — only needed if you applied
the operator manually.
Access Prometheus and Grafana¶
# Expose via NodePort
kubectl patch svc kube-prometheus-stack-prometheus -n monitoring \
--type='json' \
-p='[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"add","path":"/spec/ports/0/nodePort","value":30090}]'
kubectl patch svc kube-prometheus-stack-grafana -n monitoring \
--type='json' \
-p='[{"op":"replace","path":"/spec/type","value":"NodePort"},{"op":"add","path":"/spec/ports/0/nodePort","value":30030}]'
# Get Grafana password
kubectl get secret -n monitoring kube-prometheus-stack-grafana \
-o jsonpath="{.data.admin-password}" | base64 -d ; echo
- Prometheus:
http://<node-ip>:30090 - Grafana:
http://<node-ip>:30030(admin / password above)
Import the Naftiko Dashboard¶
GRAFANA_PASSWORD=$(kubectl get secret -n monitoring kube-prometheus-stack-grafana \
-o jsonpath="{.data.admin-password}" | base64 -d)
curl -s -X POST http://<node-ip>:30030/api/dashboards/import \
-H "Content-Type: application/json" \
-u "admin:${GRAFANA_PASSWORD}" \
-d "{
\"dashboard\": $(cat config/observability/dashboards/grafana-naftiko.json),
\"overwrite\": true,
\"inputs\": [{
\"name\": \"DS_PROMETHEUS\",
\"type\": \"datasource\",
\"pluginId\": \"prometheus\",
\"value\": \"Prometheus\"
}],
\"folderId\": 0
}"
The dashboard shows: Active Capabilities, Request Rate, Error Rate, P99 Latency, Request Duration (p50/p95/p99), Step Duration, HTTP Client metrics.
Verify Prometheus is Scraping¶
Check that the ServiceMonitor was created:
Check the Prometheus target is UP:
curl -s "http://<node-ip>:30090/api/v1/targets" | \
python3 -c "
import sys, json
for t in json.load(sys.stdin)['data']['activeTargets']:
if '<capability-name>' in str(t['labels']):
print(t['labels'].get('job'), '->', t['health'])
"
Datadog Integration¶
Install the Datadog Agent with OTLP receiver¶
kubectl create secret generic datadog-secret \
--from-literal=api-key=<YOUR_DD_API_KEY> \
-n monitoring
cat > datadog-values.yaml << 'EOF'
datadog:
apiKeyExistingSecret: datadog-secret
site: datadoghq.eu # change to datadoghq.com for US
clusterName: my-cluster
kubelet:
tlsVerify: false
otlp:
receiver:
protocols:
http:
enabled: true
grpc:
enabled: true
apm:
portEnabled: true
systemProbe:
enabled: false
processAgent:
enabled: false
env:
- name: DD_HOSTNAME
value: my-cluster-node
agents:
useHostNetwork: false
EOF
helm repo add datadog https://helm.datadoghq.com
helm install datadog-agent datadog/datadog \
--namespace monitoring -f datadog-values.yaml
Configure the Operator to Push to Datadog¶
Set these env vars on the Naftiko Skipper operator Deployment:
kubectl set env deployment/naftiko-skipper \
NAFTIKO_OTEL_ENDPOINT=http://datadog-agent.monitoring.svc.cluster.local:4318 \
NAFTIKO_OTEL_PROTOCOL=http/protobuf \
NAFTIKO_OTEL_SAMPLING_RATE=1.0 \
-n naftiko-system
Every capability pod will automatically receive:
- OTEL_SERVICE_NAME=naftiko-{capability-name}
- OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent.monitoring.svc.cluster.local:4318
- OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
- OTEL_TRACES_SAMPLER_ARG=1.0
Force Reconcile¶
After updating the operator env vars, reconcile your capabilities to pick up the new OTEL configuration:
View in Datadog APM¶
Navigate to APM → Services in the Datadog UI. Your capability will appear as
ikanos-{capability-label} with request rate, latency percentiles, and error rate.
Operator-Level OTEL Configuration¶
All OTEL env vars are configured once on the operator — every capability in the cluster inherits them automatically:
| Env var | Injected as | Default |
|---|---|---|
NAFTIKO_OTEL_ENDPOINT |
OTEL_EXPORTER_OTLP_ENDPOINT |
not set |
NAFTIKO_OTEL_PROTOCOL |
OTEL_EXPORTER_OTLP_PROTOCOL |
not set |
NAFTIKO_OTEL_HEADERS |
OTEL_EXPORTER_OTLP_HEADERS |
not set |
NAFTIKO_OTEL_SAMPLING_RATE |
OTEL_TRACES_SAMPLER_ARG |
not set |
OTEL_SERVICE_NAME is always set to naftiko-{capability-name} — no
configuration needed.
Quick Reference¶
# Check control port metrics directly
kubectl port-forward svc/<capability> 9090:9090 -n default &
curl http://localhost:9090/metrics | grep ikanos
# Check ServiceMonitor
kubectl get servicemonitor <capability> -n default
# Force reconcile after spec change
kubectl annotate capability <capability> \
reconcile-at=$(date +%s) --overwrite -n default
# Check Prometheus has the target UP
kubectl port-forward -n monitoring \
svc/kube-prometheus-stack-prometheus 9091:9090 &
# then open http://localhost:9091/targets