What is PostgresML?
PostgresML is an end-to-end machine learning platform built entirely inside PostgreSQL. It allows you to train models and make predictions using SQL queries, without moving data out of your database.
Overview
PostgresML transforms your PostgreSQL database into a complete machine learning platform by adding support for training and deploying models directly within the database. This eliminates the need for external ML infrastructure and data pipelines.
Key Features
🤖 In-Database ML
- Train models directly on your data without ETL
- Use familiar SQL syntax for machine learning
- Real-time predictions with low latency
- No data movement reduces security risks
🎯 Pre-trained Models
- Access to Hugging Face transformers
- State-of-the-art NLP models
- Computer vision models
- Time series forecasting models
⚡ Vector Operations
- Built-in vector similarity search
- Store and query embeddings efficiently
- Semantic search capabilities
- RAG (Retrieval Augmented Generation) support
🔧 Multiple Algorithms
- Classification and regression
- Clustering and dimensionality reduction
- Natural language processing
- Time series analysis
How PostgresML Works
PostgresML extends PostgreSQL with machine learning capabilities through a native extension. The workflow is simple:
- Store your data in PostgreSQL tables
- Train models using SQL functions
- Make predictions with SQL queries
- Evaluate and deploy models in production
This approach provides several advantages:
- No data movement between systems
- Leverage PostgreSQL's ACID guarantees
- Use existing database security and permissions
- Scale with your database infrastructure
Use Cases
Real-Time Recommendations
- Product recommendations in e-commerce
- Content personalization
- User behavior prediction
Natural Language Processing
- Text classification and sentiment analysis
- Named entity recognition
- Question answering systems
- Semantic search
Fraud Detection
- Real-time transaction analysis
- Anomaly detection
- Risk scoring
Time Series Forecasting
- Sales forecasting
- Demand prediction
- Resource planning
Vector Search & RAG
- Semantic document search
- Chatbots with context
- Knowledge base queries
Architecture Components
PostgresML Extension
A PostgreSQL extension written in Rust that provides machine learning functions and operations directly in the database.
Dashboard
A web application for managing models, monitoring performance, and visualizing results.
SDK
Client libraries for Python, JavaScript, and other languages to interact with PostgresML programmatically.
Model Store
Built-in model registry for versioning and deploying trained models.
Comparison with Other Solutions
| Feature | PostgresML | External ML Services | Standalone ML Platforms |
|---|---|---|---|
| Data Movement | None | Required | Required |
| Latency | Very Low | Medium-High | Medium |
| Setup Complexity | Low | Medium | High |
| SQL Integration | Native | Via APIs | Via ETL |
| Cost | Database only | Additional services | Additional infrastructure |
Supported Algorithms
PostgresML supports a wide range of algorithms through scikit-learn, XGBoost, LightGBM, and more:
- Linear Models: Linear/Logistic Regression, Ridge, Lasso
- Tree-based: Random Forest, Gradient Boosting, XGBoost, LightGBM
- Neural Networks: MLP, Deep Learning via transformers
- Clustering: K-Means, DBSCAN, Hierarchical
- Dimensionality Reduction: PCA, t-SNE, UMAP
- NLP: Transformers from Hugging Face
Getting Started
Ready to try PostgresML? Continue with the Installation Guide to set up PostgresML on your PostgreSQL database.
Resources
- Homepage: https://postgresml.org
- Source Code: https://github.com/postgresml/postgresml
- Documentation: https://postgresml.org/docs
- Discord Community: Join here