Retrieval¶
Retrieval is the step that finds relevant documents for a user question.
GitHelp currently supports two retrieval backends:
simple
mmore
Simple retriever¶
File:
src/githelp/retrieval/simple_retriever.py
The simple retriever is a local debugging and dynamic-project backend.
It:
reads a selected
corpus.jsonl;tokenizes the query and documents;
ranks documents with overlap-based heuristics;
adds boosts for exact symbols, signatures, modules, titles, and user-facing documentation.
It does not use embeddings or MMORE.
Command:
python scripts/debug_retrieval.py \
"How do I configure indexing?" \
--corpus-path data/projects/mmore/corpus.jsonl
MMORE retriever¶
File:
src/githelp/retrieval/mmore_retriever.py
src/githelp/retrieval/mmore_native.py
src/githelp/retrieval/mmore_subprocess.py
src/githelp/retrieval/mmore_corpus.py
src/githelp/retrieval/mmore_result_mapping.py
The MMORE retriever:
runs native MMORE retrieval in an isolated subprocess;
loads a MMORE retriever from config inside that subprocess;
calls
retriever.retrieve(...)when native retrieval is available;parses GitHelp metadata from retrieved text;
converts raw MMORE results back into
RetrievalResultobjects.falls back to lexical ranking over
mmore_corpus.jsonlif the native process fails locally.
Retrieved sources are tagged with one of these metadata values:
native_index
corpus_fallback
Command example:
python scripts/prepare_answer.py \
"How do I configure indexing?" \
--backend mmore
Retriever factory¶
File:
src/githelp/retrieval/retriever_factory.py
This file gives the rest of GitHelp one entry point:
retrieve_documents(...)
It chooses the backend based on:
simple
mmore
Project profiles¶
Retrieval results can be refined by a project profile.
Project profiles live in:
src/githelp/project_profiles/
They can:
expand queries;
filter irrelevant sources;
rerank retrieved results;
answer some structured questions directly.
This keeps the core GitHelp retrieval pipeline generic while allowing MMORE-specific improvements to remain isolated.
Backend choice¶
For a project corpus built from the Streamlit interface, simple is useful for
quick deterministic checks:
backend simple
For the main MMORE workflow, use:
backend mmore
The mmore backend attempts native MMORE index retrieval first. If that native
process fails, GitHelp falls back to the exported mmore_corpus.jsonl next to
the selected project corpus so Streamlit can continue answering. Results tagged
corpus_fallback did not come from MMORE/Milvus vector search.
The high-level answering pipeline retrieves a wider candidate pool before
profile filtering and reranking. For code-, symbol-, or filename-oriented
questions, it may merge simple lexical candidates into the MMORE candidate
pool. The final source list can therefore contain evidence rescued by the
simple retriever even when backend mmore was selected.
The default app config selects the mmore project profile globally. Generated
project configs do not change this value automatically; select a generic or
custom app config when querying another project.