Architecture overview¶

GitHelp is organized around a simple idea: all sources are converted into the same internal document format before retrieval.

The initial use case is MMORE, but the core pipeline is designed to remain project-agnostic. Project-specific behavior is isolated in optional project profiles.

High-level flow¶

GitHelp schema

Main design choices¶

GitHelp separates the pipeline into clear blocks:

Block	Role
`loaders/`	Load source files that are already documentation-like.
`extractors/`	Extract documentation from source code.
`corpus/`	Combine all sources into one corpus.
`indexing/`	Export and index the corpus with MMORE.
`retrieval/`	Retrieve relevant documents.
`project_profiles/`	Hold optional project-specific query expansion, filtering, reranking, and direct answers.
`rag/`	Build prompts and generate answers.
`projects/`	Manage selected projects, generated project configs, and persisted app state.
`app/`	Streamlit user interface.

Why keep a GitHelp format?¶

GitHelp uses its own DocumentRecord format instead of exposing MMORE everywhere.

This keeps the project modular:

the corpus can be inspected before indexing;
the simple retriever can run without MMORE;
MMORE can be replaced or updated without rewriting loaders;
retrieved sources keep consistent metadata for citations;
Streamlit can work with project-specific corpora before MMORE indexing is available.

Simple backend vs MMORE backend¶

The simple backend reads a selected corpus.jsonl directly. It is useful for:

local development;
direct checks of newly built project corpora;
debugging retrieval quality;
avoiding MMORE indexing.

The mmore backend is the main MMORE workflow. It retrieves from an MMORE index when native retrieval succeeds, and it can fall back to the exported mmore_corpus.jsonl if the local native process fails. That fallback is lexical and does not use native MMORE/Milvus vector search.

The full MMORE workflow is:

Build corpus → export MMORE corpus → build MMORE index → use backend mmore

For code-, symbol-, and filename-oriented questions, the high-level answering pipeline may merge lexical candidates from the simple retriever with MMORE candidates before applying the project profile.