Spark was created as a processing framework for Hadoop that’s both faster and easier to use than the traditional MapReduce framework, and it’s catching on fast among folks writing big data applications.
Spark’s popularity is based on a few factors, including that it supports numerous programming languages (all of which are easier to write in than MapReduce) and supports faster data analysis both in-memory and on disk. It also allows for iterative queries on existing datasets, which — along with its speed — makes it more ideal for machine learning workloads. There are a number of workload-specific implementations on top of Spark, too, including Shark for interactive SQL queries, SparkR for statistical…
Voir l’article original 190 mots de plus