Deployment troubleshooting

This page covers common problems with the EPFL Docker deployment. Start with the user guide in Using GitHelp on the EPFL lab server and the maintenance checks in Maintaining the EPFL deployment.

The browser cannot reach the site

Use the full URL, including port 1312 and the final slash:

http://gpu217.rcp.epfl.ch:1312/githelp/

Connect to the EPFL VPN when outside the EPFL network. If the service is still unreachable, a maintainer should run:

docker compose ps
docker compose logs --tail=100 githelp

The page is blank

Confirm that the URL ends with /, then perform a hard refresh:

http://gpu217.rcp.epfl.ch:1312/githelp/

Check that the GitHelp and Traefik containers share the traefik network and that the labels in docker-compose.yml are attached to the running container.

curl localhost:8501 fails on the host

This is expected for the server Compose file because port 8501 is not published on the host. Traefik reaches it through the Docker network.

Test inside the container instead:

docker exec -it githelp curl http://localhost:8501/_stcore/health

GitHelp is healthy but Traefik does not route it

Inspect the labels and networks:

docker inspect githelp --format '{{json .Config.Labels}}'
docker inspect githelp --format '{{json .NetworkSettings.Networks}}'
docker inspect root-traefik-1 --format '{{json .NetworkSettings.Networks}}'

Both services must share the external traefik network. The configured router matches /githelp, strips that prefix, and forwards the remaining path to Streamlit running at / inside the container.

A local project path does not exist

Paths entered in Streamlit are evaluated inside the GitHelp container. Use:

/app/data/repositories/<repository_folder>

not the corresponding host path under /home/githelp/GitHelp/.

MMORE indexing reports FAISS AVX warnings

Messages such as these are not necessarily fatal:

Could not load library with AVX512 support
Could not load library with AVX2 support
Successfully loaded faiss

If FAISS eventually reports a successful load, inspect the later exception or the Streamlit build output for the actual failure.

Transformers or tokenizer incompatibility

GitHelp pins Transformers below version 5 because the current MMORE sparse-model path depends on Transformers 4 APIs:

transformers>=4.51.0,<5

Confirm the installed version inside the container:

docker exec -it githelp python -c "import transformers; print(transformers.__version__)"

Rebuild the Docker image without cache if its dependency layer is stale.

PyTorch is too old

The Dockerfile uses:

FROM pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

After changing the base image or dependencies, rebuild cleanly:

docker compose down
docker compose build --no-cache
docker compose up -d

CUDA is unavailable in the container

Check PyTorch and the selected devices:

docker exec -it githelp python -c "import torch; print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.cuda.device_count())"

The server Compose file requests GPU access and currently sets NVIDIA_VISIBLE_DEVICES=1. A device count of one can therefore be correct even when the host has several GPUs. If CUDA is unavailable, verify the NVIDIA Container Toolkit and Docker GPU configuration on the host.

The first answer is slow

The local Qwen model is loaded on the first LLM request and then cached by Streamlit. Hugging Face files are also cached in the persistent githelp_hf_cache volume. Check GPU activity with:

watch -n 1 nvidia-smi

The selected MMORE backend returns unexpected sources

Open Latest answer sources and diagnostics and check the reported mode:

native_index
corpus_fallback

corpus_fallback means native retrieval failed and GitHelp used lexical retrieval over the exported MMORE corpus. If the mode is native_index, verify that the shared index was most recently built from the selected project’s mmore_corpus.jsonl.

The application currently uses mmore_docs. An index manually built under a different collection name will not be queried by the default retrieval config.

The MMORE index is incomplete

A failed build can leave incomplete local metadata. Rebuild through the Streamlit Build MMORE index action or use the manual command documented in Maintaining the EPFL deployment. The index wrapper resets the local Milvus Lite database before building, so confirm that replacing the currently indexed project is intended.

Validated deployment characteristics

The repository deployment configuration is designed for:

  • Docker Compose;

  • Traefik routing under /githelp;

  • CUDA-enabled PyTorch 2.6 with CUDA 12.4;

  • persistent project data and Hugging Face caches;

  • Tesla V100-class server GPUs;

  • MMORE index building and Streamlit access from the EPFL network or VPN.

Runtime availability should still be verified with the health, log, and GPU commands above after each deployment update.