Home » Apache Spark Advanced Topics

Opleiding: Apache Spark Advanced Topics

Bij: OEM Office Elearning Menu


OEM Office Elearning Menu

P.J, Oudweg 4

Inhoud van de cursus

Spark Core
start the course
recall what is included in the Spark Stack
define lazy evaluation as it relates to Spark
recall that RDD is an interface comprised of a set of partitions, list of dependencies, and functions to compute
pre-partition an RDD for performance
store RDDS in serialized form
perform numeric operations on RDDs
create custom accumulators
use broadcast functionality for optimization
pipe to external applications
adjust garbage collection settings
perform batch import on a Spark cluster
determine memory consumption
tune data structures to reduce memory consumption
use Spark's different shuffle operations to minimize memory usage of reduce tasks
set the levels of parallelism for each operation
create DataFrames
interoperate with RDDs
describe the generic load and save functions
read and write Parquet files
use JSON Dataset as a DataFrame
read and write data in Hive tables
read and write data using JDBC
run the Thrift JDBC/OCBC server
show the different ways to tune up Spark for better performance
Spark Streaming
start the course
describe what a DStream is
recall how TCP socket input streams are ingested
describe how file input streams are read
recall how Akka Actor input streams are received
describe how Kafka input streams are consumed
recall how Flume input streams are ingested
set up Kinesis input streams
configure Twitter input streams
implement custom input streams
describe receiver reliability
use the UpdateStateByKey operation
perform transform operations
perform Window operations
perform join operations
use output operations on Streams
use DataFrame and SQL operations on streaming data
use learning algorithms with MLlib
persist stream data in memory
enable and configure checkpointing
deploy applications
monitor applications
reduce batch processing times
set the right batch interval
tune memory usage
describe fault tolerance semantics
perform transformations on Dstreams
MLlib, GraphX, and R
start the course
describe data types
recall the basic statistics
describe linear SVMs
perform logistic regression
use nave bayes
create decision trees
use collaborative filtering with ALS
perform clustering with K-means
perform clustering with LDA
perform analysis with frequent pattern mining
describe the property graph
describe the graph operators
perform analytics with neighborhood aggregation
perform messaging with Pregel API
build graphs
describe vertex and edge RDDs
optimize representation through partitioning
measure vertices with PageRank
install SparkR
run SparkR
use existing R packages
expose RDDs as distributed lists
convert existing RDDs into DataFrames
read and write parquet files
run SparkR on a cluster
use the algorithms and utilities in MLlib

Toelatingseisen: wat heb je nodig?

Er is geen specifieke voorkennis vereist.

Duur van de cursus

11 uur


Award Winning E-learning

Plaatsen / leslocaties

Heel Nederland, E-learning, Online

Algemene informatie over de cursus

Bestel deze unieke E-learning cursus Apache Spark Advanced Topics online, 1 jaar 24/ 7 toegang tot rijke interactieve video’s, spraak, voortgangsbewaking door rapportages en testen.

Duur: 11 uur
Taal: Engels
Certificaat van deelname: Ja
Online toegang: 365 dagen
Voortgangsbewaking: Ja
Award Winning E-learning: Ja
Geschikt voor mobiel: Ja

Informatie aanvragen

Gegevens aangeduid met een * zijn verplicht in te vullen.
resterend: tekens
B36b6 Typ de code exact over: (hoofdlettergevoelig)
Copyright 2009-2022 Particuliereopleidingen.nl | Algemene voorwaarden | Overzicht van onze aanbieders | Adverteren