pt-mongodb-stalk

NAME

pt-mongodb-stalk - Collect forensic data about MongoDB when problems occur.

SYNOPSIS

Usage

pt-mongodb-stalk [OPTIONS]

DESCRIPTION

pt-mongodb-stalk watches a MongoDB server for a trigger condition and collects diagnostic data when that trigger occurs. It follows the same basic operating model as pt-stalk, but uses MongoDB administration commands instead of MySQL commands.

The default trigger watches serverStatus.connections.current. You can also watch currentOp, host CPU usage, host memory usage, queued writers, and replica set replication lag.

OPTIONS

--ask-pass

Prompt for a password when connecting to MongoDB.

--authenticationDatabase

type: string; default: admin

Authentication database for the MongoDB shell connection.

--collect

default: yes; negatable: yes

Collect diagnostic data when the trigger occurs.

--config

type: string

Read this comma-separated list of config files. If specified, this must be the first option on the command line.

--cycles

type: int; default: 5

How many times --variable must be greater than --threshold before triggering --collect.

--daemonize

Daemonize the tool and log output to --log.

--dest

type: string; default: /var/lib/pt-mongodb-stalk

Where to save diagnostic data.

--disk-bytes-free

type: size; default: 100M

Do not collect if the disk has less than this much free space.

--disk-pct-free

type: int; default: 5

Do not collect if the disk has less than this percent free space.

--function

type: string; default: status

Trigger source. Valid built-in values are status, currentop, cpu, memory, writewait, and repllag.

With status, --variable is a dot-separated path inside serverStatus. Example:

--function status --variable connections.current --threshold 200

With currentop, --variable is a dot-separated field path inside each currentOp.inprog document and --match is a regex. The trigger value is the number of matching operations. Example:

--function currentop --variable command.aggregate --match '^orders$' --threshold 10

With cpu, the trigger value is host CPU busy percentage from /proc/stat. With memory, the trigger value is host memory used percentage from /proc/meminfo. mem is accepted as an alias for memory. Example:

--function cpu --threshold 85
--function memory --threshold 90

With writewait, the trigger value is serverStatus.globalLock.currentQueue.writers. Example:

--function writewait --threshold 5

With repllag, the trigger value is the maximum replica set member lag behind the primary, in seconds, from replSetGetStatus. replicationlag and waitForReplicationLag are accepted as aliases. Example:

--function repllag --threshold 30

You can also specify a file that defines trg_plugin.

--help

Print help and exit.

--host

short form: -h; type: string; default: localhost

Host to connect to.

--interval

type: int; default: 1

How often to check the trigger, in seconds.

--iterations

type: int

How many collections to perform before exiting.

--log

type: string; default: /var/log/pt-mongodb-stalk.log

Print all output to this file when daemonized.

--match

type: string

Regex pattern used with --function currentop.

--password

short form: -p; type: string

Password to use when connecting.

--pid

type: string; default: /var/run/pt-mongodb-stalk.pid

Create the given PID file.

--plugin

type: string

Load a plugin that defines any of the standard before_* or after_* hooks.

--port

short form: -P; type: int; default: 27017

Port number to use for connection.

--prefix

type: string

Filename prefix for diagnostic samples.

--retention-count

type: int; default: 0

Keep data for the last N runs.

--retention-time

type: int; default: 30

Number of days to retain collected samples.

--run-time

type: int; default: 30

How long interval collectors should run when the trigger occurs.

--sleep

type: int; default: 300

How long to sleep after collecting.

--sleep-collect

type: int; default: 1

Polling interval for interval collectors, in seconds.

--stalk

default: yes; negatable: yes

Watch the server and wait for the trigger to occur. Specify --no-stalk to collect immediately.

--threshold

type: float; default: 100

Collection is triggered when --variable is greater than this value.

--tls

default: ; negatable: yes

Enable TLS for the MongoDB shell connection.

--sslCAFile

type: string

Path to the TLS CA file.

--sslPEMKeyFile

type: string

Path to the TLS client certificate and key file.

--uri

type: string

Full MongoDB URI to connect with.

--user

short form: -u; type: string

User for login.

--variable

type: string; default: connections.current

Variable to watch inside serverStatus or currentOp. This option is ignored by cpu, memory, writewait, and repllag.

--verbose

type: int; default: 3

Print level of information. Values: 1 errors, 2 matching triggers and collection info, 3 non-matching triggers.

--version

Print version and exit.

EXAMPLES

Run in stalking mode and collect twice when the trigger is met:

pt-mongodb-stalk \
  --host localhost --port 30001 --user admin --password admin --authenticationDatabase admin \
  --function status --variable connections.current --threshold 50 --cycles 3 --interval 1 --iterations 2 \
  --dest /tmp/pt-mongodb-stalk

Run immediately without stalking and collect one short sample:

pt-mongodb-stalk \
  --host localhost --port 30004 --user admin --password admin --authenticationDatabase admin \
  --no-stalk --iterations 1 --run-time 6 --sleep-collect 1 \
  --dest /tmp/pt-mongodb-stalk

Run immediately without stalking and collect multiple short runs:

pt-mongodb-stalk \
  --host localhost --port 30000 --user admin --password admin --authenticationDatabase admin \
  --no-stalk --iterations 3 --run-time 6 --sleep-collect 1 --sleep 1 \
  --dest /tmp/pt-mongodb-stalk

Run immediately without stalking and collect fewer, more widely spaced samples:

pt-mongodb-stalk \
  --host localhost --port 27000 --user admin --password admin --authenticationDatabase admin \
  --no-stalk --iterations 1 --run-time 10 --sleep-collect 2 \
  --dest /tmp/pt-mongodb-stalk

Run in stalking mode using currentOp matches instead of a serverStatus metric:

pt-mongodb-stalk \
  --host localhost --port 30001 --user admin --password admin --authenticationDatabase admin \
  --function currentop --variable command.aggregate --match '^orders$' --threshold 10 --cycles 2 --interval 1 --iterations 1 \
  --dest /tmp/pt-mongodb-stalk

Run in stalking mode when host CPU is above 85 percent:

pt-mongodb-stalk \
  --host localhost --port 30001 --user admin --password admin --authenticationDatabase admin \
  --function cpu --threshold 85 --cycles 3 --interval 1 --iterations 1 \
  --dest /tmp/pt-mongodb-stalk

Run in stalking mode when host memory is above 90 percent:

pt-mongodb-stalk \
  --host localhost --port 30001 --user admin --password admin --authenticationDatabase admin \
  --function memory --threshold 90 --cycles 3 --interval 1 --iterations 1 \
  --dest /tmp/pt-mongodb-stalk

Run in stalking mode when replica set lag is above 30 seconds:

pt-mongodb-stalk \
  --host localhost --port 30001 --user admin --password admin --authenticationDatabase admin \
  --function repllag --threshold 30 --cycles 2 --interval 1 --iterations 1 \
  --dest /tmp/pt-mongodb-stalk

OUTPUT

When the trigger condition is met for the configured number of consecutive cycles, the tool collects into --dest. Snapshot commands run once per collection iteration and are stored as timestamped files. For example:

2026_04_24_10_00_01-serverStatus.json
2026_04_24_10_00_01-currentOp.json
2026_04_24_10_00_01-ps.txt

Interval commands run once per collection iteration using --sleep-collect as their polling interval and a count derived from --run-time. For example, --run-time 5 --sleep-collect 1 runs commands like vmstat 1 5 and stores the result in one timestamped file.

The collection window is capped by --run-time. After a collection finishes, the tool waits --sleep seconds before the next trigger check or collection iteration. Collections do not overlap.

The collector also writes these fixed files in the destination directory:

heartbeat
log
trigger

COLLECTED DATA

MongoDB data collected once per collection iteration:

serverStatus
currentOp

MongoDB interval tools collected once per collection iteration:

mongostat
mongotop

MongoDB JSON output from shell commands is cleaned before writing: ok, $clusterTime, and operationTime are removed.

System data collected once per collection iteration, depending on tool availability:

ps faux
pidstat -d
pidstat -u
pidstat -urdwt for mongod or mongos

System interval tools collected once per collection iteration, depending on tool availability:

vmstat
iostat
mpstat
top

The process-specific pidstat file is named pidstat_mongod.txt or pidstat_mongos.txt according to the detected MongoDB process type. Topology is detected internally for this purpose, but no topology summary file is written.

If /var/log/messages exists, the tool also copies it to messages.out.

OUTPUT CLEANUP

At the end of a run, zero-byte .err files are removed and non-empty .err files are kept. The collector enforces disk-space safety checks, but uses internal temporary files instead of writing disk-space snapshots into --dest.

NOTES

This tool is intended for Linux systems. A separate summary-style tool should collect broader one-time server and MongoDB metadata; pt-mongodb-stalk is focused on runtime sampling around a trigger event.

This program is copyright 2011-2026 Percona LLC and/or its affiliates.

THIS PROGRAM IS PROVIDED “AS IS” AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.