Troubleshooting

Kubox is complex, but we strive to make it as simple as possible. However, this complexity means there are several things that can potentially go wrong. Below is a list of common issues our customers have encountered:

Control Logging Verbosity

The default logging level is set to info. To control the logging level in Kubox, set the LOG_LEVEL environment variable in the .env file located in the working directory. For example:

  LOG_LEVEL="debug"

The available logging levels, in increasing order of verbosity, are: panic, fatal, error, warn, info, debug, and trace.

Verifying GPU Nodes

kubectl run \
  nvidia-test \
  --restart=Never \
  -ti --rm \
  --image nvcr.io/nvidia/cuda:12.5.0-base-ubuntu22.04 \
  --overrides '{"spec": {"runtimeClassName": "nvidia"}}' \
  nvidia-smi

apiVersion: v1
kind: Pod
metadata:
  name: ray-tty
  labels:
    app: ray
spec:
  runtimeClassName: nvidia
  nodeSelector:
    nodetype: "gpu"
  tolerations:
    - key: "gpunode"
      operator: "Equal"
      value: "true"
      effect: "NoExecute"
  containers:
    - name: ray-container
      image: rayproject/ray:latest-py311-cu123
      command: ["/bin/sh"]
      args: ["-c", "while true; do sleep 30; done"]
      tty: true
      stdin: true
  restartPolicy: Never

Failed Delete

Deleting AWS resource manually.

Get Started

Knowledge Base

Product Updates

Failed Delete

Get Started

Knowledge Base

Product Updates

​Failed Delete

Failed Delete