Home » Data Science Essentials

Opleiding: Data Science Essentials

Bij: OEM Office Elearning Menu


OEM Office Elearning Menu

P.J, Oudweg 4

Inhoud van de cursus

start the course
define data science and what it is to be a data scientist
describe the data wrangling aspect of data science
describe the big data aspect of data science
describe the machine learning aspect of data science
Implementing Data Science

use common data science terminology
recognize ways to communicate results of your data science
recall the steps in data science analysis
compare various tools and software libraries used for data science
Practice: Exploring Data Science

Exercise: Explore Your Data Science Needs
Data Gathering
Data Extraction

start the course
describe problems and software tools associated with data gathering
use curl to gather data from the Web
use in2csv to convert spreadsheet data to CSV format
use agate to extract data from spreadsheets
use agate to extract tabular data from dbf files
extract data from particular tags in an HTML document

distinguish between metadata and data
work with metadata in HTTP Headers
work with Linux log files
work with metadata in email headers
Remote Data

perform a secure shell connection to a remote server
copy remote data using a secure copy
synchronize data from a remote server
Practice: Curl and HTML

download an HTML file and explore table data
Data Filtering
Introduction to Data Filtering

start the course
identify common filtering techniques and tools
extract date elements from common date formats
parse content types in HTTP headers
use csvcut to filter CSV data
use sed to replace values in a text data stream
drop duplicate records from data
extract headers from a jpeg image
use pdfgrep to extract data from searchable pdf files
detect invalidimpossible data combinations
parse robots.txt from a web site to decide what should and shouldn't be crawled nor indexed
Practice: Filtering Dates

drop records from a CSV file based on date range
Data Transformation
File Format Conversions

start the course
convert CSV data to JSON format
convert XML data to JSON format
create SQL inserts from CSV data
extract CSV data from SQL
change delimiters in a csv file from commas to tabs
Data Conversions

convert basic date formats to standard ISO 8601 format
convert numeric formats within a CSV document
round floating point decimals to two places within a CSV document
Optical Character Recognition

use optical character recognition (OCR) to extract text from a jpeg image
use optical character recognition (OCR) to extract text from a pdf document
Practice: Converting Dates

read various date formats and convert to standard compliant ISO 8601 format
Data Exploration
Introduction to Data Exploration

start the course
use csvgrep to explore data in CSV data
use csvstat to explore values in CSV data
use csvsql to query CSV data like a SQL database
use gnuplot to quickly plot data on the command line
use wc to count words, characters, and lines within a text file
explore a subdirectory tree from the command line
use natural language processing to count word frequencies in a text document
take random samples from a list of records
find the top rows by value and percent in a data set
find repeated records in a data set
identify outliers using standard deviation
Practice: Exploring Word Frequencies

perform a word frequency count on a classic book from Project Gutenberg
Data Integration
Introduction to Data Integration

start the course
use csvjoin to enate CSV data
use the cat function to enate separate logs into a single file
sort lines in a text file
merge separate xml files into a single schema
aggregate data from a CSV file into a table of summarized values
normalize data from unstructured sources
denormalize data from a structured source
use pivot tables to cross tabulate data
insert missing values in a data set
Practice: Joining CSV Data

use csvjoin to merge two compatible CSV documents into one
Data Analysis Concepts
Data Science Math

start the course
perform basic math operations required by data scientists
perform basic vector math operations required by data scientists
perform basic matrix math operations required by data scientists
perform a matrix decomposition
Data Analysis Concepts

identify different forms of data
describe probability in terms of events and sample space size
describe basic properties of outcomes
apply probability rules in calculation
identify common continuous probability distributions
identify common discrete probability distributions
apply bayes theorem and describe how it is used in email spam algorithms
Estimates and Measures

apply random sampling to A/B tests
identify and describe various statistical measures
describe the difference between an unbiased and biased estimator
describe sampling distributions and recognize the central limit theorem
define confidence intervals and work with margins of error
carrying out hypothesis tests and working with p-values
apply the chi-square test for categorical values
Practice: Identifying Data

identify the given data set descriptions by their types
Data Classification and Machine Learning
Machine Learning Introduction

start the course
identify problems in which supervised learning techniques apply
identify problems in which unsupervised learning techniques apply
apply linear regression to machine learning problems
identify predictors in machine learning
Regression and Classification

apply logistic regression to machine learning problems
describe the use of dummy variables
use naive bayes classification techniques
work with decision trees

describe K-means clustering
define cluster validation
define principal component analysis
Errors and Validation

describe machine learning errors
describe underfitting
describe overfitting
apply k-folds cross validation
describe fall-forward and back-propagation in neural networks
describe SVMs and their use
Practice: Choosing a Method

choose the appropriate machine learning method for the given example problems
Data Communication and Visualization
Introduction to Data Communication

start the course
choose appropriate visualization techniques
describe the difference between correlation and causation
define Simpson's paradox
communicate data science results informally
communicate data science results formally
implement strategies for effective data communication

use scatter plots
use line graphs
use bar charts
use histograms
use box plots
create a network visualization
create a bubble plot
create an interactive plot
Practice: Creating a Scatter Plot
find an appropriate data set in which a scatter plot represents it visually and plot it

Toelatingseisen: wat heb je nodig?

Er is geen specifieke voorkennis vereist.

Duur van de cursus

15 uur


Award Winning E-learning

Plaatsen / leslocaties

Heel Nederland, E-learning, Online

Algemene informatie over de cursus

Bestel deze unieke E-learning cursus Data Science Essentials online, 1 jaar 24/ 7 toegang tot rijke interactieve video’s, spraak, voortgangsbewaking door rapportages en testen per hoofdstuk om de kennis direct te toetsen.

Duur: 15 uur
Taal: Engels
Certificaat van deelname: Ja
Online toegang: 365 dagen
Voortgangsbewaking: Ja
Award Winning E-learning: Ja
Geschikt voor mobiel: Ja

Informatie aanvragen

Gegevens aangeduid met een * zijn verplicht in te vullen.
resterend: tekens
B528E Typ de code exact over: (hoofdlettergevoelig)
Copyright 2009-2022 Particuliereopleidingen.nl | Algemene voorwaarden | Overzicht van onze aanbieders | Adverteren