[2026.6.24] v1.4.12 — A new LightRAG Server retrieval engine, a lightweight PyMuPDF4LLM parsing engine, and a FAISS vector backend that makes large knowledge-base retrieval dramatically faster.
I measured around 0.08ms overhead. LiteLLM's Python proxy added about 7ms to 8ms per request. However, an LLM call takes 500ms to 30 seconds. A 7ms delay is almost invisible compared to the model ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results