# Architecture overview

GitHelp is organized around a simple idea: all sources are converted into the
same internal document format before retrieval.

The initial use case is MMORE, but the core pipeline is designed to remain project-agnostic. Project-specific behavior is isolated in optional project profiles.

## High-level flow

![GitHelp schema](../_static/images/schema.png)


## Main design choices

GitHelp separates the pipeline into clear blocks:

| Block | Role |
|---|---|
| `loaders/` | Load source files that are already documentation-like. |
| `extractors/` | Extract documentation from source code. |
| `corpus/` | Combine all sources into one corpus. |
| `indexing/` | Export and index the corpus with MMORE. |
| `retrieval/` | Retrieve relevant documents. |
| `project_profiles/` | Hold optional project-specific query expansion, filtering, reranking, and direct answers. |
| `rag/` | Build prompts and generate answers. |
| `projects/` | Manage selected projects, generated project configs, and persisted app state. |
| `app/` | Streamlit user interface. |

## Why keep a GitHelp format?

GitHelp uses its own `DocumentRecord` format instead of exposing MMORE everywhere.

This keeps the project modular:

- the corpus can be inspected before indexing;
- the simple retriever can run without MMORE;
- MMORE can be replaced or updated without rewriting loaders;
- retrieved sources keep consistent metadata for citations;
- Streamlit can work with project-specific corpora before MMORE indexing is available.

## Simple backend vs MMORE backend

The `simple` backend reads a selected `corpus.jsonl` directly. It is useful for:

- local development;
- direct checks of newly built project corpora;
- debugging retrieval quality;
- avoiding MMORE indexing.

The `mmore` backend is the main MMORE workflow. It retrieves from an MMORE index
when native retrieval succeeds, and it can fall back to the exported
`mmore_corpus.jsonl` if the local native process fails. That fallback is lexical
and does not use native MMORE/Milvus vector search.

The full MMORE workflow is:

```text
Build corpus → export MMORE corpus → build MMORE index → use backend mmore
```

For code-, symbol-, and filename-oriented questions, the high-level answering
pipeline may merge lexical candidates from the simple retriever with MMORE
candidates before applying the project profile.