Limitations

This page lists the main current limitations and the design choices used to handle them.

Repository Loading

GitHelp currently supports public GitHub repositories through local git clone. Private repositories are not handled yet because they require authentication and credential-management decisions.

Existing clones under data/repositories/ are reused as local folders. GitHelp does not automatically pull updates from the remote repository.

Repository ingestion is optimized for Python projects. Documentation, YAML, and the repository tree can still be collected from other repositories, but API extraction currently understands Python syntax only.

Corpus and Index Freshness

Building a GitHelp corpus writes:

data/projects/<project_name>/corpus.jsonl

This does not automatically rebuild the MMORE index. For MMORE retrieval, the project corpus must also be exported and indexed:

corpus.jsonl -> mmore_corpus.jsonl -> MMORE index

The corpus and exported MMORE JSONL are project-specific. Native MMORE indexing currently uses one shared Milvus Lite database and the mmore_docs collection; rebuilding resets that database and replaces the previously indexed project.

MMORE Native Retrieval

The mmore backend first attempts native MMORE retrieval in a child process. This keeps the Streamlit process alive if local native dependencies crash.

When the native process succeeds, retrieved sources are tagged with:

native_index

When the native process fails, GitHelp falls back to the exported mmore_corpus.jsonl and tags retrieved sources with:

corpus_fallback

The fallback still answers from the MMORE-formatted corpus, but it does not use the native Milvus vector search path. It ranks records with the simple lexical retriever.

For some code-, symbol-, and filename-oriented questions, the high-level answering pipeline also merges simple lexical candidates with MMORE candidates. Selecting the MMORE backend therefore does not guarantee that every final source originated from the native index.

Local Environment Sensitivity

MMORE, Milvus Lite, PyTorch, Transformers, and OpenMP native libraries can be sensitive to the Python and package versions installed locally.

Known examples:

  • MMORE sparse indexing currently requires Transformers 4.x, so GitHelp pins transformers>=4.51.0,<5.

  • Some macOS environments can hit an OpenMP runtime conflict while loading native ML libraries.

  • Python 3.14 may expose dependency compatibility issues earlier than Python 3.11 or 3.12.

LLM Provider

The default LLM provider uses a local Qwen model through Hugging Face Transformers. Answer quality, latency, and memory usage depend on the selected model and local hardware.

The dummy provider remains available for tests and pipeline debugging without loading a model.

The repository does not currently include an external or hosted LLM provider.

Project Profiles

The active profile is selected globally from configs/app_config.yaml. The default is the MMORE-specific profile, and building a project-specific corpus does not update it automatically. Use project_profile: generic for another project unless it requires its own profile.

Retrieval Quality

GitHelp combines several retrieval improvements:

  • project profiles;

  • query expansion;

  • filtering and reranking;

  • source-grounded prompts;

  • expected-source retrieval evaluation.

It is still not a complete code intelligence system. Future work could include dependency graphs, richer symbol indexing, tests/examples extraction, and evaluation-driven reranking.

The current evaluation set contains ten MMORE questions, with expected-source annotations for only a small subset. It is useful as a regression check but is not a systematic benchmark of retrieval or answer quality.