- Identify the questions to answer, e.g., which services are slow?
- Identify what information is needed to answer the identified questions, e.g., latency to service requests.
- Identify what needs to be logged to get the needed information, e.g., arrival and departure times of requests and responses.
- Identify how to get the information, e.g., log the arrival/departure of requests/responses at service endpoints.
- Identify the techniques and technologies to log the needed information, e.g., Log4J logging library with Spring AOP to inject logging statements.
- Identify the techniques and technologies to process and monitor logs — offline vs streaming vs online.
- Identify the techniques and technologies to present the information to the ops team (aka develop the dashboard).
The key is to identify questions of interest, required information, and enabling techniques in order.
Few things to keep in mind.
- We cannot log everything that matters cos’ often we don’t know everything that matters. Also, often what matters evolves over time as the system evolves. So, be prepared to accommodate changes. Make choices that are flexible and robust.
- We cannot log everything (almost always) cos’ the size of data and the amount of noise (often) will overwhelm the monitoring system and the ops team. Log what is needed and include more as required.
- Agility (Nimbleness) in development and deployment process is key to a long lasting and useful logging and monitoring system. The above statements rely on iterations; hence, reinforcing the need to be agile.
- In logging/monitoring, quick trumps slow and utility trumps quick.