Troublshooting: Zero Trust Identity for Workloads¶
Common Issues and Solutions¶
Issue 1: Pods Stuck in "ImagePullBackOff"¶
Symptoms:
kubectl get pods -n web-app-1
# NAME READY STATUS RESTARTS AGE
# web-app-pod-1-xxx-yyy 0/3 ImagePullBackOff 0 2m
Cause: Hopr registry credentials are incorrect or missing.
Evaluation: ‘Describe’ the Failing Pod to See Events (use the names from the kubectl get pods -A command)
Check the Events: section of the terminal response. It should look something like this:| Events: Type Reason Age From Message \---- \------ \---- \---- \------- Normal Scheduled 7m33s default-scheduler Successfully assigned web-app-2/web-app-pod-2-7cc7d6b75-64r4t to k3d-wosp-cluster-server-0 Normal Pulled 7m33s kubelet Container image "serial-app-wosp-node:latest" already present on machine Normal Created 7m33s kubelet Created container web-app Normal Started 7m33s kubelet Started container web-app Normal Pulled 7m33s kubelet Container image "repo.hoprapi.com/hopr/web-retriever:v0.3.0" already present on machine Normal Created 7m33s kubelet Created container web-retriever Normal Started 7m33s kubelet Started container web-retriever Normal Pulled 6m44s (x4 over 7m33s) kubelet Container image "repo.hoprapi.com/hopr/xtra-wasm-filter:v2.0.3" already present on machine Normal Created 6m44s (x4 over 7m33s) kubelet Created container xtra-wasm Normal Started 6m44s (x4 over 7m33s) kubelet Started container xtra-wasm Warning BackOff 2m23s (x24 over 7m29s) kubelet Back-off restarting failed container xtra-wasm in pod web-app-pod-2-7cc7d6b75-64r4t\_web-app-2(7986d534-f378-4cf0-a06c-2f3d41188e93) |
| :---- |
Solution:
- Verify your Hopr credentials in
pod-X-vars.yaml:
repo_username: "YOUR_ACTUAL_USERNAME" # Not placeholder
repo_password: "YOUR_ACTUAL_PASSWORD" # Not placeholder
repo_email: "YOUR_ACTUAL_EMAIL" # Not placeholder
- Check the secret was created correctly:
Be sure to use the correct namespace
- Delete and recreate the deployment with correct credentials:
kubectl delete -f deployments/pod-1-deployment.yaml
# Update credentials in pod-1-vars.yaml
ytt -f manifests/pod-1-vars.yaml -f manifests/hopr-p2p.templ.yaml -f manifests/base-deployment.yaml > deployments/pod-1-deployment.yaml
kubectl apply -f deployments/pod-1-deployment.yaml
Issue 2: Pods Stuck at ⅔ or ⅓ Running¶
Symptoms:
kubectl get pods -n web-app-1
# NAME READY STATUS RESTARTS AGE
# web-app-pod-1-xxx-yyy 2/3 Running 0 5m
Cause: One container is failing to start.
Solution:
- Check which container is failing:
kubectl describe pod -n web-app-1 -l app=web-app-pod-1
# Look for "Waiting" or "CrashLoopBackOff" status
- Check logs of the failing container:
# If xtra-wasm is failing:
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c xtra-wasm
# If web-app is failing:
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c web-app
# If web-retriever is failing:
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c web-retriever
-
Common causes:
-
xtra-wasm failing: Invalid Hopr license credentials
- web-app failing: Docker image not imported to k3d
- web-retriever failing: ConfigMap issue
Issue 3: ConfigMap Not Found¶
Symptoms:
Cause: YTT file order or namespace creation issue.
Solution:
- Verify namespace was created:
- Check ConfigMaps exist:
- If missing, redeploy:
Issue 4: "Namespace not found" Error on First Deploy¶
Symptoms:
Error from server (NotFound): error when creating "pod-1-deployment.yaml": namespaces "web-app-1" not found
Cause: Kubernetes tries to create resources before the namespace is created. The order of the individual yamls in the pod directory was changed from that of the delivered blueprint.
Solution: Apply the deployment file a second time (twice):
kubectl apply -f pod-1/ -f pod-2/
# First run creates namespace, may show errors for other resources
kubectl apply -f pod-1/ -f pod-2/
# Second run creates remaining resources successfully
Issue 5: Application Shows UTF-8 Decode Errors¶
Symptoms:
Cause: This is expected behavior when the Hopr proxy returns encrypted data instead of JSON. However, if Pod-2 isn't receiving anything, there's a routing problem.
Solution:
- Check if Pod-2 is receiving batons:
- If Pod-2 shows no activity, check the remote_endpoints configuration:
# Verify Pod-1's Envoy config points to Pod-2
kubectl get configmap hopr-envoyconfig -n web-app-1 -o yaml | grep -A 10 "remote_service"
- Should show:
- If incorrect, update
pod-1-vars.yamlremote_endpoints and regenerate.
Issue 6: Baton Trail Keeps Growing (Infinite Loop)¶
Symptoms:
➡️ CLIENT (web-app-pod-2): Forwarding baton #1. Trail: ['web-app-pod-2', 'web-app-pod-2', 'web-app-pod-2', ...]
Cause: Pod is sending messages to itself instead of the remote pod.
Solution:
- Check remote_endpoints in vars file:
# pod-1-vars.yaml should point to POD-2:
remote_endpoints:
- web-app-pod-2-ingress.web-app-2.svc.cluster.local:18000
# pod-2-vars.yaml should point to POD-1:
remote_endpoints:
- web-app-pod-1-ingress.web-app-1.svc.cluster.local:18000
-
Verify each pod points to the OTHER pod, not itself.
-
Regenerate and redeploy with correct configuration.
Issue 7: Pod Shows "ErrImageNeverPull"¶
Symptoms:
Failed to pull image "serial-app-wosp-node:latest": rpc error: code = Unknown desc = image not found
Cause: Application Docker image not imported to k3d cluster.
Solution:
# Build the image
cd app
docker build -t serial-app-wosp-node:latest .
# Import to k3d
k3d image import serial-app-wosp-node:latest -c wosp-cluster
# Verify import
docker exec k3d-wosp-cluster-server-0 crictl images | grep serial-app
# Delete and recreate pods
kubectl delete pods -n web-app-1 --all
kubectl delete pods -n web-app-2 --all
Diagnostic Commands Reference¶
# === Pod Diagnostics ===
# View all resources in a namespace
kubectl get all -n web-app-1
# Describe a pod (shows events and detailed status)
kubectl describe pod -n web-app-1 -l app=web-app-pod-1
# Get logs from all containers in a pod
kubectl logs -n web-app-1 -l app=web-app-pod-1 --all-containers=true
# Get previous container logs (for crashed containers)
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c xtra-wasm --previous
# === Service and Network Diagnostics ===
# Check service endpoints
kubectl get endpoints -n web-app-1
# Describe a service
kubectl describe svc web-app-pod-1-ingress -n web-app-1
# Test DNS from within a pod
kubectl exec -n web-app-1 $(kubectl get pods -n web-app-1 -l app=web-app-pod-1 -o jsonpath='{.items[0].metadata.name}') -c web-app -- \
nslookup web-app-pod-2-ingress.web-app-2.svc.cluster.local
# === Configuration Diagnostics ===
# View ConfigMap contents
kubectl get configmap hopr-envoyconfig -n web-app-1 -o yaml
# View Secret (base64 encoded)
kubectl get secret hopr-license -n web-app-1 -o yaml
# === Event Diagnostics ===
# View cluster events (sorted by time)
kubectl get events -n web-app-1 --sort-by='.lastTimestamp'
# View recent events for a specific pod
kubectl get events -n web-app-1 --field-selector involvedObject.kind=Pod
# === Interactive Debugging ===
# Execute commands inside a container
kubectl exec -it -n web-app-1 $(kubectl get pods -n web-app-1 -l app=web-app-pod-1 -o jsonpath='{.items[0].metadata.name}') -c web-app -- /bin/bash
# Port-forward to access a service locally
kubectl port-forward -n web-app-1 svc/web-app-pod-1-ingress 18000:18000
Getting Help¶
If you're still stuck after trying these solutions:
- Gather diagnostic information:
# Create a diagnostic report
kubectl get all -n web-app-1 > diagnostics.txt
kubectl get all -n web-app-2 >> diagnostics.txt
kubectl describe pod -n web-app-1 -l app=web-app-pod-1 >> diagnostics.txt
kubectl describe pod -n web-app-2 -l app=web-app-pod-2 >> diagnostics.txt
kubectl logs -n web-app-1 -l app=web-app-pod-1 --all-containers=true >> diagnostics.txt
kubectl logs -n web-app-2 -l app=web-app-pod-2 --all-containers=true >> diagnostics.txt
- Check k3d version:
k3d version