Developer BeRo1985 has made available PasLLM, a high-performance LLM inference engine written entirely in Object Pascal. Unlike wrappers around llama.cpp or external runtimes, PasLLM implements model loading, quantization, and inference natively, allowing large language models to run directly from Delphi and FreePascal applications.
The framework includes support for:
- Native LLM inference in Object Pascal
- Local and offline AI execution
- Llama, Qwen, Phi, Gemma, Mixtral, DeepSeek, and other model families
- Custom high-efficiency quantization formats
- Delphi and FreePascal support
- CLI and GUI applications
- VCL, FMX, and LCL examples
- Cross-platform deployment
PasLLM introduces its own optimized quantization formats, designed to reduce model size while preserving inference quality. The project supports a wide range of modern open-weight models and can be integrated directly into Object Pascal projects without requiring Python or external inference frameworks.
Currently focused on CPU inference, PasLLM is aimed at developers who want complete control over local AI execution while staying entirely within the Delphi and FreePascal ecosystem. Future GPU acceleration is planned through the author’s PasVulkan framework.
Explore a pure Object Pascal approach to running modern LLMs locally.

