good observability

observability = the ability to understand the internal state or condition of a complex system based solely on knowledge of it’s external outputs/telemetry. the ability to ask arbitrary questions about your production systems without having to write new code
there are broadly speaking three flavors of observability
- (1) logs: time-stamped, discrete records of application events
- (2): traces: a record of a single request’s path across multiple services. decomposed into spans, where you have one span per service/operation. shows the sequence of things that happened (service A called service B which called service C which queried the database)
- (3): metrics: numeric measurements aggregated over time intervals
  - counter - a number that only goes up (total error rate / total requests served)
  - gauge - a number that can go up and down (current cpu usage, active connections right now)
  - histogram - distribution of values across buckets (of the last 10k requests, 8k took 0-100ms, the rest took 500+ ms
- traditional monitoring vs. observability
- traditional: predict what might break, write a dashboard/alert for it. not super robust to unexpected breaks that you have no visibility into
- observability: emit rich, high dimensionality events from your code at all times (let say every request records 50+ attributes). then when something weird happens you query that data interactively & this way you don’t have to anticipate the exact failure paths
  - high cardinality
  - high dimensionality