mysql

1. How to troubleshoot a sudden slow MySQL query

Check the query itself

Review if the SQL query can be optimized, such as using appropriate indexes, avoiding full table scans, etc.
Complex JOIN operations: Confirm if there are multiple large table joins or unnecessary complex operations.

Check database performance

Slow query log: Enable and check the slow query log (slow_query_log). Review which queries have been running for a long time recently.

Check system load

System resources: Check if there are bottlenecks in CPU, memory, disk IO, and network resources. Use tools like top or htop to monitor. Disk IO: Check if disk read/write has become a bottleneck. Use tools like iostat or vmstat to view disk IO status.

Check lock and concurrency issues

Lock status: Check if there are lock waiting or deadlock issues. Use the following SQL statements to view the current lock status: If there are many threads in the “Locked” state, it may be necessary to further analyze the source of the locks and find solutions.

Network latency

If connecting to MySQL remotely, confirm if the network connection is stable and if there is high latency.

redis

1. Hotkey and large key issues

Hotkey issues

Hotkey: Refers to frequently accessed keys that may lead to decreased database performance.

Impact:

CPU concentrated on a single Redis node
QPS uneven, cluster load imbalance
Latency spikes, request timeouts
Triggering failover in nodes

Solutions

Business layer sharding/bucketing: Split hot requests into multiple keys to improve concurrency capacity.
Increase cache layer: Local cache + Redis secondary cache to reduce direct access to Redis.
Use multi-key splitting to read pressure, breaking a hot large key into multiple sub-keys.
Hotspot protection at the Redis Proxy layer, where the proxy layer can identify hotspots and perform degradation.
Client-side local throttling + degradation

Large key issues

Large key: Refers to keys with very large values that occupy a large amount of memory space.

Impact:

Slow during migration slots (CLUSTER ADDSLOTS/MIGRATE)
May block the Redis main thread
Trigger blocking during deletion (del O(n) operation)
Overload network bandwidth
Increase in read/write latency
Slow failover recovery

Solutions

Business layer splitting: Split large keys into multiple small keys, with each small key storing only a portion of the data.
Pagination read: Do not use hgetall/smembers/zrange, use scan / segmented retrieval.
Pre-limit the size of the value: Do key size limits on the service side.
Avoid storing logs, messages, and long texts in Redis.

How to find large keys

Redis’s SCAN command
redis-cli -h 127.0.0.1 -p 6379 —bigkeys
Open-source tool Redis RDB Tools, analyze RDB files, and scan out Redis large keys.

kafka

1. How to handle message backlog

Causes of message backlog

Producers produce a large number of messages to a Topic in a short period of time, and consumers cannot consume them in time.
Consumers have insufficient consumption capacity (low consumer concurrency, long message processing time), leading to consumption efficiency lower than production efficiency.
Consumer anomalies (such as consumer failures, consumer network anomalies, etc.) causing inability to consume messages.
Topic partition settings are not reasonable, or new partitions are added without consumers consuming them.
Topic frequent rebalancing leads to reduced consumption efficiency.

Solutions

From the causes of message backlog, the message backlog problem can be handled from three aspects: consumer side, producer side, and service side.

Consumer side
- Increase the number of consumers according to actual business needs to ensure that the number of partitions/consumers is an integer, and it is recommended that the number of consumers and partitions be consistent.
- Increase the consumption speed of consumers by optimizing consumer processing logic (reducing complex calculations, third-party interface calls, and read database operations), reducing consumption time.
- Increase the number of messages pulled by consumers each time: Pull data/process time >= production speed.
Producer side
- Add a random suffix to the message Key when producing messages to balance the distribution of messages to different partitions.
Service side
- Reasonably set the number of partitions for the Topic, and increase the number of Topic partitions without affecting business processing efficiency.
- When message backlog occurs on the service side, implement circuit breaking for producers or forward producers’ messages to other Topics.