深圳快乐彩: Big Data Processing with Apache Spark
Apache Spark is an open-source big-data processing framework built around speed, ease of use, and sophisticated analytics.
Spark has several advantages compared to other big-data and MapReduce technologies like Hadoop and Storm. It provides a comprehensive, unified framework with which to manage big-data processing requirements for datasets that are diverse in nature (text data, graph data, etc.) and that come from a variety of sources (batch versus real-time streaming data).
Spark enables applications in HDFS clusters to run up to a hundred times faster in memory and ten times faster even when running on disk.
In this mini-book, the reader will learn about the Apache Spark framework and will develop Spark programs for use cases in big-data analysis. The book covers all the libraries that are part of Spark ecosystem, which includes Spark Core, Spark SQL, Spark Streaming, Spark MLlib, and Spark GraphX.
Table of Contents:
- Part 1: Overview
- Part 2: Spark SQL
- Part 3: Spark Streaming
- Part 4: Spark Machine Learning
- Part 5: spark.ml Data Pipelines
- Part 6: Graph Data Analytics with Spark GraphX
- Part 7: Emerging Trends in Data Science