Back to blog
devops

Threads Monitoring: MySQL Deep Observability for Cloud SQL

CloudSQL for MySQL gives you CPU graphs and connection counts. When something goes wrong you need to know which exact query ran for 3 seconds at 14:07:43, which user triggered it, and which threads were competing for locks. Threads Monitoring captures every query from performance_schema as it executes, with no agents and no schema changes, and produces two self-contained HTML reports for immediate investigation.

Jiří Šůla
Jiří Šůla
Database Engineer
April 23, 20268 min read
Threads Monitoring: MySQL Deep Observability for Cloud SQL

Why Threads Monitoring?

CloudSQL for MySQL gives you CPU graphs and connection counts — sampled, basic, not enough. When something goes wrong you need to know which exact query ran for 3 seconds at 14:07:43, which user triggered it, whether it used an index, and how many threads were competing for locks at that moment. Threads Monitoring captures every query from performance_schema as it executes — no agents, no schema changes, no server-side installation — and produces two self-contained HTML reports for immediate investigation. Works out of the box with GCP Cloud SQL.


The Capture — The Most Critical Part

Before any report exists, Threads Monitoring has to continuously scrape the database — polling every active query, every thread, every few seconds — without disturbing the production workload. This is the hardest part to get right.

Why this matters:

  • A poorly written capture tool can become its own source of load — extra connections, heavy queries, lock pressure
  • On a busy production instance, even a small overhead compounds fast
  • The capture must be precise enough to miss nothing, yet light enough to be invisible

Why Rust:

  • Zero garbage collector pauses — polling intervals stay tight and predictable
  • Memory-efficient by design — no hidden allocations between polls
  • Compiled, fast, and deterministic under load

How Threads Monitoring behaves on your server:

  • Single persistent read-only connection — not a pool
  • Queries only performance_schema in-memory tables — no I/O cost on the server
  • Session variables set to suppress slow-query logging and binary log entries
  • Adaptive polling — backs off automatically if the server is under pressure

The result: the tool runs comfortably alongside a 78K-QPS production instance without registering as noise.


Timeline — Graphical View of Database Activity

Every database thread gets a horizontal swimlane. Every query execution appears as a dot or bar at the exact time it ran — colour-coded by statement type. Zoom from 30 seconds to the full capture length.

What you can spot immediately:

  • A thread stalling while all others move freely → lock contention
  • A burst of queries at a specific timestamp → application event or job trigger
  • One thread doing nothing for 10 seconds then a flood of work → queue buildup
  • Queries that look fast individually but fire in tight loops → N+1 problem

Navigation:

  • Zoom presets: 30s / 1m / 2m / 5m / 15m / 1h / All
  • Minimap for global density overview
  • Shift+drag any window → instant breakdown of event count, type mix, total wait time
  • Filter by statement type or user

Analytics Report

Aggregated insight across the entire capture — which queries cost the most, who causes the most load, where indexes are missing.

What you can answer:

  • Which 5 query patterns consume 80% of total server time?
  • Which users run the heaviest workloads?
  • Which queries do full table scans?
  • What does the latency distribution look like — is the p99 an outlier or a pattern?
  • Are there lock wait storms? Disk temp table spills?

What's inside:

SectionQuestion It Answers
Overview cardsScale of the capture at a glance
Top queries by server timeWhere the money is being spent
Most frequent queriesWhat's hammering the server
Full scans without indexEasiest optimization wins
Lock contentionWhat's blocking what
Latency histogram + percentilesp50 / p75 / p90 / p95 / p99
User × Statement type matrixWho does what, in heat-map form
Slowest individual executionsWorst single outliers
Temp table + heavy row scan analysisHidden cost patterns
Query rate over timeLoad shape across the capture window

What It Is Good For

  • Incident investigation — capture during a slowdown, replay the timeline, find the exact moment and query
  • Query optimization — identify which patterns cost the most in aggregate, not just which ones look slow in isolation
  • Missing index discovery — full-scan detection flags queries that bypass indexes entirely
  • User behavior analysis — understand who runs what and at what volume
  • Lock diagnosis — see which threads wait and which threads hold
  • Capacity planning — sustained query rate, latency percentiles, DML volume in one report

What It Is Not

  • Not an APM agent or always-on monitor
  • Not a slow query log parser (it sees fast queries too — the ones that add up)
  • Not a general-purpose dashboarding tool
  • Does not modify your database in any way

Part 2 — Technical Dive

Pipeline

Threads Monitoring — Functional Process & Outputs

Four phases end-to-end: session preparation (GCP proxy, MySQL connection, health checks) → continuous monitoring cycle (self-adjusting poll loop, deduplication, cross-session digest cache) → efficient in-memory storage (string interning, data compaction) → cleanup and reporting (atomic DB dump, timeline HTML, Parquet export, analytics HTML).

Each poll: fetch new events from events_statements_history_long → deduplicate → batch-resolve digests → write to SQLite. On stop: build indexes, atomic dump to disk, generate both HTML outputs, export Parquet.

GCP Cloud SQL: set gcp_instance in config and the tool spawns and manages cloud-sql-proxy automatically.

Timestamps: derived from MySQL's internal nanosecond timer, calibrated once at startup against wall clock (RTT-corrected). Accurate regardless of poll jitter.

Threads Monitoring capture running in terminal

Capture running against a busy production instance. Each line is one poll — format: #<poll> <new>/<buf> new/buf <ms>ms [tc=<threads> q=<queries> ded=<deduped> dp=<digests_pending> sq=<saturated>] <RSS>. Example: #997: 1148/1148 new/buf 849ms [tc=64 q=478 ded=2 dp=151 sq=1] 53.3MB — poll 997 found the ring buffer completely full (1148/1148, sq=1 = saturated), took 849 ms, saw 64 active threads (tc), captured 478 new query events (q), skipped 2 already seen (ded), has 151 digest texts still pending resolution (dp), current process memory 53.3 MB. After Ctrl+C: indexes built, in-memory DB atomically dumped to disk, both HTML reports and Parquet generated automatically.


Notable Implementation Choices

  • Single connection — not a pool; one persistent read-only connection keeps server footprint near zero
  • Server-side filteringTIMER_END > last_high_water_mark transfers only new rows; optional TIMER_WAIT threshold drops sub-ms noise before it crosses the wire
  • In-memory SQLite — events accumulate in RAM, indexes built after capture ends, flushed atomically via VACUUM INTO
  • String interning — repeated values (user, host, digest) stored as Arc<str>; zero allocation after warmup
  • Cross-session digest cache<instance>.digests.cache persists query text across runs; digest resolution drops to ~0 ms after the first capture
  • Self-contained HTML — no server, no CDN, no build step; string-deduplicated JSON payload, events stored as byte arrays (~20× smaller than JS objects)

Output

Threads Monitoring timeline view

Timeline report from a real capture: 121,569 events, 1,092 threads, 114.6 seconds. Each row is one database thread (ID on the left). Dots are individual query executions, colour-coded by type — the filter bar lets you toggle SELECT, INSERT, UPDATE, DELETE, BEGIN, COMMIT, ROLLBACK, SHOW, SET, CALL, OTHER on/off, or narrow to a specific user. The tooltip on a hovered query shows: thread ID, start time (Prague timezone), duration (230 ms here), rows examined (836,040 — a near-full scan), rows sent (2), and the SQL digest. The minimap above the swimlanes shows global event density across the whole capture.

Threads Monitoring analytics dashboard

Analytics dashboard for the same session (31.8 h of data, 3.11 MB Parquet). The tab bar navigates 17 sections — abbreviations expanded: Latency Dist = Latency Distribution, No-Index = queries with no index used, Idx Health = index health by user, Parquet Meta = Parquet file metadata. Overview cards show the full session at a glance: 121,569 events, 1,092 threads, 44 unique users, 1,212 distinct query patterns, 53,765 commits, 16 rollbacks. Statement Type Breakdown (bottom left) shows call counts, time share, and avg/max latency per type. Latency Distribution (right) is bucketed by magnitude — here 92.8% of queries completed under 10 ms, with a small tail stretching to 10+ seconds.


CLI

threads2 capture    # Capture live (Ctrl+C to stop)
threads2 dry-run    # Test connection, show performance_schema stats
threads2 report     # Generate timeline HTML from a saved session
threads2 analyze    # Generate analytics HTML from Parquet
threads2 list       # Show all captured sessions

Configuration

mysql:
  gcp_instance: "project:region:instance"
  user: "monitoring_user"
  password_file: "/path/to/password.txt"
  compress: true

capture:
  interval_ms: 2000
  min_timer_wait_us: 1000   # drop sub-1ms events server-side (0 = capture all)
  exclude_self: true

storage:
  in_memory: true
  export_parquet: true
  keep_db: false
  long_query_threshold_ms: 300

Output files (auto-named from instance):

  • <instance>.html — interactive timeline
  • <instance>.parquet — compressed, queryable via DuckDB / pandas / etc.
  • <instance>-analytics.html — analytics dashboard
  • <instance>.digests.cache — cross-session digest cache

Tech Stack

ComponentTechnology
LanguageRust — async via Tokio
MySQL drivermysql_async — binary protocol, pure-Rust TLS
StorageSQLite (bundled) → Parquet via DuckDB (bundled)
OutputSelf-contained HTML + ZSTD Parquet
#mysql#cloud-sql#observability#performance#rust#gcp