Apache Kafka is a distributed event-streaming platform that handles real-time events. Topics are ordered logs of events and are partitioned to ensure scalability and fault tolerance.
Producers write messages to topics, and consumers read messages from them. Kafka supports multiple consumers for independent processing of the same stream.
Kafka integrates the features of both queuing and publish-subscribe models, offering consumers the advantages of each approach.
Messages are serialized as key/value pairs before sending to Kafka. Kafka stores messages on disk with configurable retention periods providing data persistence and reliability.
Kafka ensures data durability and fault tolerance by replicating partition data across multiple brokers. The primary copy of a partition is the leader replica, while additional copies are follower replicas. Data is written to the leader, which automatically replicates updates to the followers.
Serialization and deserialization in Kafka are about converting data between its original format and a byte array for transmission and storage, allowing producers and consumers to communicate efficiently.
Compression in Kafka is reducing the size of messages before they are stored or transmitted, optimizing storage usage, reducing network bandwidth consumption, and improving overall performance.
Apache Kafka can be used in various use cases such as real-time analytics, event sourcing, log aggregation, data pipelines and handling sensor data from IoT devices in real-time.
Optimizing Kafka's performance involves fine-tuning various components to balance throughput and latency effectively, tuning options include partition management, producer configuration, and consumer configuration.
An example application in NestJS and KafkaJS is built which records temperature in a room and transmits this data using Kafka. Producer, Consumer, IProducer, and IConsumer classes are created to handle the specifics of Kafka’s Producer and Consumer implementations.