RAG and LLM layer¶

The RAG layer starts after retrieval.

Its job is to turn retrieved sources into a grounded answer.

Relevant files¶

src/githelp/rag/prompting.py
src/githelp/rag/extractive_answerer.py
src/githelp/rag/retrieval_query.py
src/githelp/rag/answering.py
src/githelp/rag/llm_provider.py
src/githelp/rag/llm_factory.py
src/githelp/rag/qwen_provider.py

Prompting¶

File:

src/githelp/rag/prompting.py

This module formats retrieved sources into a prompt.

The prompt instructs the LLM to:

answer only from the provided sources;
cite sources inline with [Source 1], [Source 2], etc.;
avoid inventing commands, paths, APIs, modules, or configuration keys;
avoid interpreting configuration values unless the sources explain them;
begin with a brief, direct context sentence;
use practical numbered steps for how-to questions;
group parameters by role instead of repeating one description pattern;
clearly separate supported facts, safe inferences, and missing evidence;
say when the sources are incomplete or insufficient.

Debug command:

python scripts/debug_prompting.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

LLM providers¶

GitHelp uses a provider interface:

src/githelp/rag/llm_provider.py

The active provider is selected from:

configs/app_config.yaml

Example:

llm:
  provider: qwen
  model_name: Qwen/Qwen3-4B
  max_new_tokens: 512
  temperature: 0.0
  enable_thinking: false

The Qwen provider uses Hugging Face Transformers. A dummy provider is also implemented for tests and pipeline debugging. No external or hosted LLM provider is currently included.

High-level answering helpers¶

Conversation-aware retrieval query detection and rewriting live in:

src/githelp/rag/retrieval_query.py

This module keeps follow-up detection, ambiguity handling, and LLM-assisted query rewriting separate from retrieval and answer generation.

File:

src/githelp/rag/answering.py

This module exposes:

prepare_answer_prompt(...)
answer_question(...)
answer_question_with_llm(...)
answer_question_with_provider(...)

Current flow:

current question + recent chat
→ keep standalone questions unchanged, or rewrite a clear follow-up
→ ask for clarification when a follow-up has no single clear referent
→ project profile query expansion
→ retrieval
→ project profile filtering/reranking
→ optional project profile direct answer
→ prompt construction with the original question and lightweight recent context
→ LLM generation

The rewritten query is used only for retrieval. Recent chat is not appended to the retrieval query, and standalone questions are not forced into the previous topic. The final answer prompt receives at most six recent messages to resolve references, but instructs the model not to repeat earlier answers unless the user explicitly asks for a summary or rephrasing.

Direct answers from project profiles¶

Some structured questions are better answered deterministically than by an LLM.

For example, the MMORE profile can answer Milvus parameter questions directly. It scans the retrieved records for a fixed allowlist of known Milvus keys. This avoids returning unrelated fields such as model_name, top_k, or max_workers, but it is not a general YAML schema parser.

Temporary extractive answerer¶

File:

src/githelp/rag/extractive_answerer.py

This remains available when LLM generation is disabled.

It:

takes the top retrieved source;
returns its content;
has a small special case for signature questions.

Command:

python scripts/answer_question.py \
  "How do I configure indexing?" \
  --backend simple

LLM answer generation¶

LLM generation can be enabled with:

python scripts/answer_question.py \
  "How do I configure indexing?" \
  --llm \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

The expected answer includes inline source citations.