Many Oracle APEX teams monitor too much and still miss what affects users most. Dashboard sprawl creates alert fatigue without reducing incident frequency. This shortlist helps DBA teams focus on signals that directly drive application behavior.
Core Database Metrics to Track
- Session concurrency: track active vs. inactive sessions over time — spikes in waiting sessions appear here before users report slowness.
- Top wait events: monitor V$SESSION_WAIT and V$SYSTEM_EVENT for patterns. DB file sequential read, log file sync, and TX enqueue waits each point to different root causes.
- I/O pressure: track read and write IOPS and throughput against the storage tier's stated limits.
- SQL execution profile changes: compare P99 execution times for high-frequency queries before and after every deployment.
- Redo log generation rate: sudden increases indicate bulk operations, runaway processes, or misconfigured batch jobs.
APEX and ORDS Runtime Signals
- Request latency percentiles: track P50, P95, and P99 by APEX application and page — averages mask tail latency that affects real users.
- Error rates by module: APEX application logs expose error rates per page and process.
- ORDS connection pool saturation: when the pool approaches maximum during traffic peaks, requests queue then timeout.
- ORDS worker thread utilization: high worker thread usage during normal load indicates a sizing problem, not a database problem.
Operational Health Indicators
- Backup completion and restore validation: monitor completion status AND periodically test restore procedures.
- Patch and certificate expiry windows: track expiry dates in a calendar, not a spreadsheet checked quarterly.
- Tablespace utilization growth rate: track trends, not just current utilization.
- Mean time to recovery (MTTR): improving MTTR is more impactful than reducing alert volume for availability SLAs.
Building Alerts That Drive Action
Every alert threshold should map directly to a runbook. If your team cannot describe the first two actions to take when an alert fires, the alert is not ready to fire. Build threshold definitions and runbooks together — not separately.
Good observability is not about having every metric. It is about knowing exactly what to look at when something goes wrong, and having the automation in place to tell you before users do.