Version v1.5 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.
Database Cache
Overview
The Database (DB) Cache is an alternative cache implementation in Porch designed for larger deployments. It stores both package metadata and repository data in a PostgreSQL database rather than in memory or Kubernetes Custom Resources. This implementation provides better scalability and persistence characteristics for production environments with high package counts.
Key characteristics:
- Alternative implementation: Used when explicitly configured with database connection details
- Database-backed storage: All repository, package, and package revision data stored in PostgreSQL
- External dependency: Requires PostgreSQL database instance
- Suitable for: Large deployments with thousands of packages and package revisions
- Better persistence: Survives Porch server restarts without re-fetching from Git
- Git interaction: Interacts with Git only during approval/publish and sync operations
Implementation Details
Storage Architecture
The DB Cache uses a database-centric storage model:
PostgreSQL Database
│
├─ repositories table
│ └─ Repository metadata
│
├─ packages table
│ └─ Package metadata
│
├─ package_revisions table
│ └─ Package revision metadata
│
└─ package_revision_resources table
└─ Package resources (KRM YAML)
Database Storage:
- Repository connections and metadata
- Package metadata (names, paths)
- Package revision metadata (lifecycle, tasks, upstream locks)
- Package resources (full KRM resource content)
- Timestamps and user tracking for all entities
No In-Memory Cache:
- All data retrieved from database on demand
- No in-memory caching of package revisions
- Database query performance critical for responsiveness
- Relies on PostgreSQL query optimization and indexing
Database Handler
The DB Cache uses a singleton database handler that manages the PostgreSQL connection:
Connection management:
- Single database connection shared across all repositories
- Connection opened during cache initialization
- Connection pooling handled by PostgreSQL driver
- Ping check on connection open to verify connectivity
- Connection closed when cache is shut down
Configuration:
- Driver: PostgreSQL driver (pgx)
- DataSource: Connection string with host, port, database, credentials
- Configured via CacheOptions at Porch server startup
Singleton pattern:
- One DBHandler instance per Porch server
- GetDB() returns the singleton instance
- OpenDB() creates the singleton if not already open
- CloseDB() closes connection and clears singleton
Repository Storage
Each repository is stored in the repositories table with metadata stored as JSON:
Storage approach:
- Repository metadata (namespace, name, spec) persisted in database
- Metadata stored as JSON for flexibility
- Timestamps track last update and user
- Deployment flag indicates repository type
Repository lifecycle:
- OpenRepository checks if repository exists in database
- If not found, creates external repository adapter
- Writes repository metadata to database
- Starts background sync manager
- CloseRepository stops sync and deletes all cached packages
- Removes repository row from database
Package and Package Revision Storage
The DB Cache uses a relational model with four tables:
Relational structure:
- Repositories → Packages (one-to-many)
- Packages → Package Revisions (one-to-many)
- Package Revisions → Resources (one-to-one)
- Foreign key relationships enforce referential integrity
Key storage characteristics:
- Composite primary keys (namespace, name) match Kubernetes naming
- Metadata and specs stored as JSON for flexibility
- Lifecycle state stored as dedicated column for efficient filtering
- Latest revision tracked with boolean flag for query performance
- Resources stored in separate table to reduce row size
Background Synchronization
The DB Cache includes a sync manager for each repository:
Sync process:
- Periodically triggers repository sync (configurable frequency)
- Fetches cached package revisions from database (Published and DeletionProposed only)
- Fetches external package revisions from Git
- Compares cached vs external package revisions
- Deletes package revisions only in cache (removed from Git)
- Caches package revisions only in external repo (new in Git)
- Updates Repository CR condition with sync status
Sync scope:
- Only syncs Published and DeletionProposed package revisions
- Draft and Proposed revisions excluded from sync
- Aligns with database-first approach (drafts don’t exist in Git)
- Reduces sync overhead by ignoring work-in-progress packages
Version tracking:
- Caches external repository version (Git commit SHA)
- Only re-fetches from Git if version changed
- Reuses last external package revision map if version unchanged
- Reduces Git operations during frequent syncs
Change detection:
- Compares package revision keys between cached and external
- Identifies: cached-only, both, external-only
- Cached-only: Deleted from Git, remove from database
- External-only: New in Git, write to database
- Both: Already synchronized, no action needed
Sync statistics:
- Tracks count of cached-only, both, external-only
- Logs sync duration and statistics
- Reports sync errors to Repository CR condition
Concurrency control:
- Per-repository mutex prevents simultaneous syncs
- TryLock pattern: fails fast if sync already in progress
- Prevents database contention and duplicate work
Latest Revision Tracking
The DB Cache tracks the latest package revision:
Identification logic:
- Latest revision determined during sync
- Only considers Published package revisions
- Highest revision number wins
- Stored as boolean flag in
package_revisions.latestcolumn
Database flag:
latest=TRUEset on latest revisionlatest=FALSEon all other revisions- Updated during sync when new revisions added
- Enables efficient queries for latest revisions
Query optimization:
- Can filter by
latest=TRUEin SQL WHERE clause - Avoids scanning all revisions to find latest
- Improves performance for latest revision queries
Key Design Decisions
Relational Database Model
Why a relational model:
- Enforces referential integrity through foreign keys
- Prevents orphaned packages or package revisions
- Enables efficient joins to retrieve related data
- Supports complex filtering at database level
Composite primary keys:
- All tables use (k8s_name_space, k8s_name) as primary key
- Matches Kubernetes resource naming convention
- Enables multi-tenancy with namespace isolation
JSON Storage Strategy
Why JSON for metadata:
- Flexible schema without database migrations
- Stores arbitrary Kubernetes metadata (labels, annotations, etc.)
- Simplifies storage of complex nested structures
- Trade-off: Less efficient queries on JSON fields
What’s stored as JSON:
- Repository, package, and package revision metadata
- Package specs and tasks
- Upstream locks (external package revision IDs)
- Full KRM resource content
Separate Resources Table
Why separate resources:
- Package resources can be large (multiple KRM YAML files)
- Reduces row size in package_revisions table
- Improves query performance when resources not needed
- Allows fetching metadata without loading full content
Latest Revision Flag
Why a boolean flag:
- Pre-computed during sync for performance
- Enables fast filtering:
WHERE latest=TRUE - Avoids scanning all revisions to find latest
- Trade-off: Must be maintained during updates
Query Patterns
SQL joins for related data:
- Single query retrieves package revision with repository context
- Joins avoid multiple round-trips to database
- Filtering at database level reduces data transfer
- Resources fetched separately only when needed
Storage Mechanism
Draft Package Handling
The DB Cache has a database-first approach to draft packages:
Draft lifecycle:
- CreatePackageRevisionDraft creates package revision in database only
- Draft packages stored entirely in PostgreSQL
- All draft modifications update database, not Git
- UpdatePackageRevision operations modify database records
- ClosePackageRevisionDraft saves to database without Git interaction
Git interaction pattern:
- Draft creation: No Git interaction (database only)
- Draft updates: No Git interaction (database only)
- Proposed → Published transition: Pushes to Git repository
- Background sync: Pulls published packages from Git
Implications:
- Draft work isolated in database until approval
- Git repository only contains approved/published packages
- Lower network latency for draft operations
- Git repository can be temporarily unavailable during draft work
- Drafts survive Porch server restarts (persisted in database)
Publish workflow:
- Draft created and modified in database
- Lifecycle transitions: Draft → Proposed (database only)
- Lifecycle transitions: Proposed → Published (triggers Git push)
- Package revision pushed to Git with assigned revision number
- External package revision ID stored in database
- Placeholder package revision created for latest tracking
Cache Invalidation
The DB Cache uses targeted deletion for cache invalidation:
Package deletion:
- Deletes specific package and all its revisions from database
- No cache flush required
- Database foreign key constraints ensure referential integrity
- Orphaned records automatically prevented by database
Implications:
- Efficient deletion without affecting other packages
- No need to rebuild cache after deletion
- Database handles cleanup automatically
- Minimal overhead for targeted deletion
Data Persistence
The DB Cache provides true persistence:
Porch server restart:
- All repository, package, and package revision data survives restart
- No need to re-fetch from Git on startup
- Repositories automatically reconnect on first access
- Background sync resumes after reconnection
Database backup:
- Standard PostgreSQL backup procedures apply
- Point-in-time recovery possible
- Disaster recovery through database restore
- No dependency on Git availability for recovery
Data consistency:
- Database transactions ensure atomic updates
- Foreign key constraints prevent orphaned records
- Referential integrity maintained automatically
- Rollback on error prevents partial updates
Memory Management
The DB Cache has minimal memory footprint:
Memory characteristics:
- No in-memory caching of package revisions
- Only active repository connections in memory
- Database connection pool managed by driver
- Memory usage independent of package count
Scalability:
- Suitable for thousands of repositories
- Tens of thousands of package revisions
- Limited only by database capacity
- Horizontal scaling via database replication
Trade-offs:
- Lower memory usage than CR Cache
- Higher latency due to database queries
- Requires external PostgreSQL instance
- Additional operational complexity
When to Use DB Cache
Use DB Cache when:
- Managing hundreds of repositories
- Thousands of package revisions per repository
- Memory constraints on Porch server
- Need for data persistence across restarts
- Existing PostgreSQL infrastructure available
- Backup and disaster recovery requirements
Use CR Cache when:
- Small to medium deployments
- Prefer Kubernetes-native storage
- No external database dependencies desired
- Lower operational complexity preferred
- etcd capacity sufficient for package metadata