Python Build Wheel with Static File

Compile Once, Run Offline: New AI Method Matches 32B Models With a 23MB File

Local AI inference at 32B-parameter quality, no cloud API required: University of Waterloo researchers released PAW on July 2 ...

GitHub

xllamacpp - a Python wrapper of llama.cpp

As the intent is to provide a very thin wrapping layer and play to the strengths of the original c++ library as well as python, the approach to wrapping intentionally adopts the following guidelines: ...

GitHub

GitHub - facebookincubator/AITemplate: AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore ...

High performance: close to roofline fp16 TensorCore (NVIDIA GPU) / MatrixCore (AMD GPU) performance on major models, including ResNet, MaskRCNN, BERT, VisionTransformer, Stable Diffusion, etc. Unified ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Compile Once, Run Offline: New AI Method Matches 32B Models With a 23MB File

xllamacpp - a Python wrapper of llama.cpp

GitHub - facebookincubator/AITemplate: AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore ...

Trending now