Roadmap¶
GitHelp is a functional repository RAG application with local, Docker, and EPFL GPU-server workflows. This page distinguishes implemented behavior from the remaining multi-project and evaluation work.
Implemented¶
Markdown and reStructuredText loading;
Python docstring extraction with
ast;signature extraction;
YAML config loading;
repository structure summary;
unified
DocumentRecordformat;JSONL corpus generation;
project-specific corpus generation under
data/projects/;persisted Streamlit app state under
data/app_state.json;MMORE-compatible export;
MMORE indexing wrapper;
MMORE retrieval adapter;
local simple retriever for debugging and dynamic project corpora;
source-grounded prompt construction;
local Qwen LLM provider through Hugging Face Transformers;
LLM-based answer generation;
cached LLM provider in Streamlit;
optional extractive answering path;
project profiles for project-specific query expansion, filtering, reranking, and direct answers;
MMORE project profile with deterministic Milvus parameter answers;
public GitHub repository cloning into local GitHelp-managed folders;
command-line GitHub preparation for the simple backend;
retrieval evaluation script for benchmark question sets;
expected-source checks for retrieval evaluation;
Streamlit interface for project setup, corpus building, question answering, and source inspection;
Streamlit actions for building the corpus, exporting it to MMORE format, and building or rebuilding the native MMORE index;
conversational Streamlit layout with lightweight follow-up resolution;
tests for corpus building, retrieval, prompting, project state, project builder, and project profiles;
GitHub Actions workflow for running tests;
Sphinx documentation deployment through GitHub Pages;
CUDA-enabled Docker packaging and EPFL server deployment through Docker Compose and Traefik.
Current limitations¶
GitHub repository loading supports public repositories through local
git clone.Existing GitHub clones are reused as-is and are not automatically updated.
Building a corpus does not automatically rebuild the MMORE index.
The
simplebackend is useful for newly built corpora, but it is not a semantic retriever.Native MMORE retrieval depends on an existing MMORE index and compatible local dependencies; failures use a lexical corpus fallback.
Project corpora are isolated, but the active profile and native
mmore_docscollection are still global.LLM quality depends on the selected local model.
Project profiles are currently lightweight heuristics, not a general evaluation-based reranking system.
Code extraction is Python-specific and indexes documented APIs rather than complete implementation bodies or dependency graphs.
The evaluation set is preliminary and only partially annotated.
Future ideas¶
dependency graph between modules and symbols;
test and example extraction;
richer code-aware retrieval;
support for private GitHub repositories with authentication;
optional external LLM providers;
automated index freshness checks;
evaluation-driven reranking.