Architecture overview

GitHelp is organized around a simple idea: all sources are converted into the same internal document format before retrieval.

The initial use case is MMORE, but the core pipeline is designed to remain project-agnostic. Project-specific behavior is isolated in optional project profiles.

High-level flow

GitHelp schema

Main design choices

GitHelp separates the pipeline into clear blocks:

Block

Role

loaders/

Load source files that are already documentation-like.

extractors/

Extract documentation from source code.

corpus/

Combine all sources into one corpus.

indexing/

Export and index the corpus with MMORE.

retrieval/

Retrieve relevant documents.

project_profiles/

Hold optional project-specific query expansion, filtering, reranking, and direct answers.

rag/

Build prompts and generate answers.

projects/

Manage selected projects, generated project configs, and persisted app state.

app/

Streamlit user interface.

Why keep a GitHelp format?

GitHelp uses its own DocumentRecord format instead of exposing MMORE everywhere.

This keeps the project modular:

  • the corpus can be inspected before indexing;

  • the simple retriever can run without MMORE;

  • MMORE can be replaced or updated without rewriting loaders;

  • retrieved sources keep consistent metadata for citations;

  • Streamlit can work with project-specific corpora before MMORE indexing is available.

Simple backend vs MMORE backend

The simple backend reads a selected corpus.jsonl directly. It is useful for:

  • local development;

  • direct checks of newly built project corpora;

  • debugging retrieval quality;

  • avoiding MMORE indexing.

The mmore backend is the main MMORE workflow. It retrieves from an MMORE index when native retrieval succeeds, and it can fall back to the exported mmore_corpus.jsonl if the local native process fails. That fallback is lexical and does not use native MMORE/Milvus vector search.

The full MMORE workflow is:

Build corpus → export MMORE corpus → build MMORE index → use backend mmore

For code-, symbol-, and filename-oriented questions, the high-level answering pipeline may merge lexical candidates from the simple retriever with MMORE candidates before applying the project profile.