Friday, August 30, 2019

Hadoop vs Spark tech stack options

Hadoop Spark
Batch MapReduce, Hive, Pig Spark Core
Realtime Apache STORM(Nimbus, Supervisor, Topology(Spout & Bolt) Spark Streaming
Machine learning Mahout Spark Mlib
Graph Data Processing Giraph Spark GraphX
Interactive Queries Hive Spark SQL

No comments:

Post a Comment

Distributed Computing: A Guide to Comparing Data Between Hive Tables Using Spark

In big data, efficient data comparison is essential for ensuring data integrity and validating data migrations. Apache Spark, with its in-me...