# Roadmap GitHelp is a functional repository RAG application with local, Docker, and EPFL GPU-server workflows. This page distinguishes implemented behavior from the remaining multi-project and evaluation work. ## Implemented - Markdown and reStructuredText loading; - Python docstring extraction with `ast`; - signature extraction; - YAML config loading; - repository structure summary; - unified `DocumentRecord` format; - JSONL corpus generation; - project-specific corpus generation under `data/projects/`; - persisted Streamlit app state under `data/app_state.json`; - MMORE-compatible export; - MMORE indexing wrapper; - MMORE retrieval adapter; - local simple retriever for debugging and dynamic project corpora; - source-grounded prompt construction; - local Qwen LLM provider through Hugging Face Transformers; - LLM-based answer generation; - cached LLM provider in Streamlit; - optional extractive answering path; - project profiles for project-specific query expansion, filtering, reranking, and direct answers; - MMORE project profile with deterministic Milvus parameter answers; - public GitHub repository cloning into local GitHelp-managed folders; - command-line GitHub preparation for the simple backend; - retrieval evaluation script for benchmark question sets; - expected-source checks for retrieval evaluation; - Streamlit interface for project setup, corpus building, question answering, and source inspection; - Streamlit actions for building the corpus, exporting it to MMORE format, and building or rebuilding the native MMORE index; - conversational Streamlit layout with lightweight follow-up resolution; - tests for corpus building, retrieval, prompting, project state, project builder, and project profiles; - GitHub Actions workflow for running tests; - Sphinx documentation deployment through GitHub Pages; - CUDA-enabled Docker packaging and EPFL server deployment through Docker Compose and Traefik. ## Current limitations - GitHub repository loading supports public repositories through local `git clone`. - Existing GitHub clones are reused as-is and are not automatically updated. - Building a corpus does not automatically rebuild the MMORE index. - The `simple` backend is useful for newly built corpora, but it is not a semantic retriever. - Native MMORE retrieval depends on an existing MMORE index and compatible local dependencies; failures use a lexical corpus fallback. - Project corpora are isolated, but the active profile and native `mmore_docs` collection are still global. - LLM quality depends on the selected local model. - Project profiles are currently lightweight heuristics, not a general evaluation-based reranking system. - Code extraction is Python-specific and indexes documented APIs rather than complete implementation bodies or dependency graphs. - The evaluation set is preliminary and only partially annotated. ## Future ideas - dependency graph between modules and symbols; - test and example extraction; - richer code-aware retrieval; - support for private GitHub repositories with authentication; - optional external LLM providers; - automated index freshness checks; - evaluation-driven reranking.