DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating large-language model systems. It is similar to Pytest but specialized for unit testing LLM apps. DeepEval incorporates ...
Note: Nightly builds include the latest features and bug fixes but may be less stable than official releases. They follow the version format X.Y.Z.devYYYYMMDDHHMM. We welcome contributions! Please ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results