Apache Kafka is a distributed streaming platform. What exactly does that mean?A streaming platform has three key capabilities: Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system. Store streams of records in a fault-tolerant durable way. Process streams of records as they occur. Kafka is generally used for two broad classes of applications: Building real-time streaming data pipelines that reliably get data between systems or applicationsBuilding real-time streaming applications that transform or react to the streams of dataTo understand how Kafka does these things, let’s dive in and explore Kafka’s capabilities from the bottom up. First a few concepts: Kafka is run as a cluster on one or more servers that can span multiple datacenters. The Kafka cluster stores streams of records in categories called topics. Each record consists of a key, a value, and a timestamp.