AI, Gen AI and Cloud

	Apache Kafka	Amazon Kinesis
Developed/Hosted By	LinkedIn	Amazon
Software	Open-Source	Proprietary
SDK Support	AWS SDK supports Android, Java, Go, .NET	Kafka SDK supports Java
Configuration & Features	More control on configuration and better performance.	Number of days/shards can only be configured
Data Stored In	Kafka Partition	Kinesis Shard
Reliability	Replication factor can be configured	Kinesis writes synchronously to 3 different machines/data-centers
Performance	Kafka wins	Kinesis writes each message synchronously to 3 different machines
Configuration Store	Apache Zookeeper	Amazon DynamoDB
Setup	Weeks	Couple Of hours
Data Retention	Configurable	7 days at max
Log Compaction	Supported	Only can store logs for 7 days
Processing Events	More than 1000s of events/sec	Atmost 1000s of events/sec
Checkpointing	Offsets stored in special topic	DynamoDB
Ordering	Partion level	Shard level
Human Costs	Require human support for installing and managing their clusters, and also accounting for requirements such as high availability, durability, and recovery	Kinesis is just about pay and use
Producer Throughput	Kafka Wins	Kinesis is bit slower than Kafka
Incident Risk/Maintainence	More In Kafka	Amazon takes care
Ordered sequence of immutable data records	Kafka Topic	Kinesis Stream
Each record has a unique number called	Offset number	Sequence number
Concepts	Kafka Streams	Kinesis Analytics

	Hadoop	Spark
Batch	MapReduce, Hive, Pig	Spark Core
Realtime	Apache STORM(Nimbus, Supervisor, Topology(Spout & Bolt)	Spark Streaming
Machine learning	Mahout	Spark Mlib
Graph Data Processing	Giraph	Spark GraphX
Interactive Queries	Hive	Spark SQL

Friday, August 30, 2019