Skip to content

Troublshooting: Zero Trust Identity for Workloads

Common Issues and Solutions

Issue 1: Pods Stuck in "ImagePullBackOff"

Symptoms:

kubectl get pods -n web-app-1
# NAME                              READY   STATUS             RESTARTS   AGE
# web-app-pod-1-xxx-yyy             0/3     ImagePullBackOff   0          2m

Cause: Hopr registry credentials are incorrect or missing.

Evaluation: ‘Describe’ the Failing Pod to See Events (use the names from the kubectl get pods -A command)

| kubectl describe pod \<full\_pod\_name\> \-n \<namespace\> 
Check the Events: section of the terminal response. It should look something like this:

| Events:  Type     Reason     Age                     From               Message  \----     \------     \----                    \----               \-------  Normal   Scheduled  7m33s                   default-scheduler  Successfully assigned web-app-2/web-app-pod-2-7cc7d6b75-64r4t to k3d-wosp-cluster-server-0  Normal   Pulled     7m33s                   kubelet            Container image "serial-app-wosp-node:latest" already present on machine  Normal   Created    7m33s                   kubelet            Created container web-app  Normal   Started    7m33s                   kubelet            Started container web-app  Normal   Pulled     7m33s                   kubelet            Container image "repo.hoprapi.com/hopr/web-retriever:v0.3.0" already present on machine  Normal   Created    7m33s                   kubelet            Created container web-retriever  Normal   Started    7m33s                   kubelet            Started container web-retriever  Normal   Pulled     6m44s (x4 over 7m33s)   kubelet            Container image "repo.hoprapi.com/hopr/xtra-wasm-filter:v2.0.3" already present on machine  Normal   Created    6m44s (x4 over 7m33s)   kubelet            Created container xtra-wasm  Normal   Started    6m44s (x4 over 7m33s)   kubelet            Started container xtra-wasm  Warning  BackOff    2m23s (x24 over 7m29s)  kubelet            Back-off restarting failed container xtra-wasm in pod web-app-pod-2-7cc7d6b75-64r4t\_web-app-2(7986d534-f378-4cf0-a06c-2f3d41188e93) |
| :---- |

Solution:

  1. Verify your Hopr credentials in pod-X-vars.yaml:
repo_username: "YOUR_ACTUAL_USERNAME"  # Not placeholder
repo_password: "YOUR_ACTUAL_PASSWORD"  # Not placeholder
repo_email: "YOUR_ACTUAL_EMAIL"        # Not placeholder
  1. Check the secret was created correctly:
    Be sure to use the correct namespace
kubectl get secret hopr-registrycreds -n web-app-1 -o yaml
  1. Delete and recreate the deployment with correct credentials:
kubectl delete -f deployments/pod-1-deployment.yaml
# Update credentials in pod-1-vars.yaml
ytt -f manifests/pod-1-vars.yaml -f manifests/hopr-p2p.templ.yaml -f manifests/base-deployment.yaml > deployments/pod-1-deployment.yaml
kubectl apply -f deployments/pod-1-deployment.yaml

Issue 2: Pods Stuck at ⅔ or ⅓ Running

Symptoms:

kubectl get pods -n web-app-1
# NAME                              READY   STATUS    RESTARTS   AGE
# web-app-pod-1-xxx-yyy             2/3     Running   0          5m

Cause: One container is failing to start.

Solution:

  1. Check which container is failing:
kubectl describe pod -n web-app-1 -l app=web-app-pod-1
# Look for "Waiting" or "CrashLoopBackOff" status
  1. Check logs of the failing container:
# If xtra-wasm is failing:
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c xtra-wasm

# If web-app is failing:
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c web-app

# If web-retriever is failing:
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c web-retriever
  1. Common causes:

  2. xtra-wasm failing: Invalid Hopr license credentials

  3. web-app failing: Docker image not imported to k3d
  4. web-retriever failing: ConfigMap issue

Issue 3: ConfigMap Not Found

Symptoms:

MountVolume.SetUp failed for volume "envoy-config" : configmap "hopr-envoyconfig" not found

Cause: YTT file order or namespace creation issue.

Solution:

  1. Verify namespace was created:
kubectl get namespace web-app-1
  1. Check ConfigMaps exist:
kubectl get configmap -n web-app-1
# Should show: hopr-envoyconfig, wr-config
  1. If missing, redeploy:
kubectl apply -f deployments/pod-1-deployment.yaml

Issue 4: "Namespace not found" Error on First Deploy

Symptoms:

Error from server (NotFound): error when creating "pod-1-deployment.yaml": namespaces "web-app-1" not found

Cause: Kubernetes tries to create resources before the namespace is created. The order of the individual yamls in the pod directory was changed from that of the delivered blueprint.

Solution: Apply the deployment file a second time (twice):

kubectl apply -f pod-1/ -f pod-2/
# First run creates namespace, may show errors for other resources
kubectl apply -f pod-1/ -f pod-2/
# Second run creates remaining resources successfully

Issue 5: Application Shows UTF-8 Decode Errors

Symptoms:

❌ CLIENT (web-app-pod-1): Failed to forward baton #1: 'utf-8' codec can't decode byte...

Cause: This is expected behavior when the Hopr proxy returns encrypted data instead of JSON. However, if Pod-2 isn't receiving anything, there's a routing problem.

Solution:

  1. Check if Pod-2 is receiving batons:
kubectl logs -n web-app-2 -l app=web-app-pod-2 -c web-app --tail=20
  1. If Pod-2 shows no activity, check the remote_endpoints configuration:
# Verify Pod-1's Envoy config points to Pod-2
kubectl get configmap hopr-envoyconfig -n web-app-1 -o yaml | grep -A 10 "remote_service"
  1. Should show:
address: web-app-pod-2-ingress.web-app-2.svc.cluster.local
port_value: 18000
  1. If incorrect, update pod-1-vars.yaml remote_endpoints and regenerate.

Issue 6: Baton Trail Keeps Growing (Infinite Loop)

Symptoms:

➡️  CLIENT (web-app-pod-2): Forwarding baton #1. Trail: ['web-app-pod-2', 'web-app-pod-2', 'web-app-pod-2', ...]

Cause: Pod is sending messages to itself instead of the remote pod.

Solution:

  1. Check remote_endpoints in vars file:
# pod-1-vars.yaml should point to POD-2:
remote_endpoints:
  - web-app-pod-2-ingress.web-app-2.svc.cluster.local:18000

# pod-2-vars.yaml should point to POD-1:
remote_endpoints:
  - web-app-pod-1-ingress.web-app-1.svc.cluster.local:18000
  1. Verify each pod points to the OTHER pod, not itself.

  2. Regenerate and redeploy with correct configuration.

Issue 7: Pod Shows "ErrImageNeverPull"

Symptoms:

Failed to pull image "serial-app-wosp-node:latest": rpc error: code = Unknown desc = image not found

Cause: Application Docker image not imported to k3d cluster.

Solution:

# Build the image
cd app
docker build -t serial-app-wosp-node:latest .

# Import to k3d
k3d image import serial-app-wosp-node:latest -c wosp-cluster

# Verify import
docker exec k3d-wosp-cluster-server-0 crictl images | grep serial-app

# Delete and recreate pods
kubectl delete pods -n web-app-1 --all
kubectl delete pods -n web-app-2 --all

Diagnostic Commands Reference

# === Pod Diagnostics ===
# View all resources in a namespace
kubectl get all -n web-app-1

# Describe a pod (shows events and detailed status)
kubectl describe pod -n web-app-1 -l app=web-app-pod-1

# Get logs from all containers in a pod
kubectl logs -n web-app-1 -l app=web-app-pod-1 --all-containers=true

# Get previous container logs (for crashed containers)
kubectl logs -n web-app-1 -l app=web-app-pod-1 -c xtra-wasm --previous

# === Service and Network Diagnostics ===
# Check service endpoints
kubectl get endpoints -n web-app-1

# Describe a service
kubectl describe svc web-app-pod-1-ingress -n web-app-1

# Test DNS from within a pod
kubectl exec -n web-app-1 $(kubectl get pods -n web-app-1 -l app=web-app-pod-1 -o jsonpath='{.items[0].metadata.name}') -c web-app -- \
  nslookup web-app-pod-2-ingress.web-app-2.svc.cluster.local

# === Configuration Diagnostics ===
# View ConfigMap contents
kubectl get configmap hopr-envoyconfig -n web-app-1 -o yaml

# View Secret (base64 encoded)
kubectl get secret hopr-license -n web-app-1 -o yaml

# === Event Diagnostics ===
# View cluster events (sorted by time)
kubectl get events -n web-app-1 --sort-by='.lastTimestamp'

# View recent events for a specific pod
kubectl get events -n web-app-1 --field-selector involvedObject.kind=Pod

# === Interactive Debugging ===
# Execute commands inside a container
kubectl exec -it -n web-app-1 $(kubectl get pods -n web-app-1 -l app=web-app-pod-1 -o jsonpath='{.items[0].metadata.name}') -c web-app -- /bin/bash

# Port-forward to access a service locally
kubectl port-forward -n web-app-1 svc/web-app-pod-1-ingress 18000:18000

Getting Help

If you're still stuck after trying these solutions:

  1. Gather diagnostic information:
# Create a diagnostic report
kubectl get all -n web-app-1 > diagnostics.txt
kubectl get all -n web-app-2 >> diagnostics.txt
kubectl describe pod -n web-app-1 -l app=web-app-pod-1 >> diagnostics.txt
kubectl describe pod -n web-app-2 -l app=web-app-pod-2 >> diagnostics.txt
kubectl logs -n web-app-1 -l app=web-app-pod-1 --all-containers=true >> diagnostics.txt
kubectl logs -n web-app-2 -l app=web-app-pod-2 --all-containers=true >> diagnostics.txt
  1. Check k3d version: k3d version