Plan du cours
Introduction to Data Analysis and Big Data
- What Makes Big Data "Big"?
- Velocity, Volume, Variety, Veracity (VVVV)
- Limits to Traditional Data Processing
- Distributed Processing
- Statistical Analysis
- Types of Machine Learning Analysis
- Data Visualization
Big Data Roles and Responsibilities
- Administrators
- Developers
- Data Analysts
Languages Used for Data Analysis
- R Language
- Why R for Data Analysis?
- Data manipulation, calculation and graphical display
- Python
- Why Python for Data Analysis?
- Manipulating, processing, cleaning, and crunching data
Approaches to Data Analysis
- Statistical Analysis
- Time Series analysis
- Forecasting with Correlation and Regression models
- Inferential Statistics (estimating)
- Descriptive Statistics in Big Data sets (e.g. calculating mean)
- Machine Learning
- Supervised vs unsupervised learning
- Classification and clustering
- Estimating cost of specific methods
- Filtering
- Natural Language Processing
- Processing text
- Understaing meaning of the text
- Automatic text generation
- Sentiment analysis / topic analysis
- Computer Vision
- Acquiring, processing, analyzing, and understanding images
- Reconstructing, interpreting and understanding 3D scenes
- Using image data to make decisions
Big Data Infrastructure
- Data Storage
- Relational databases (SQL)
- MySQL
- Postgres
- Oracle
- Non-relational databases (NoSQL)
- Cassandra
- MongoDB
- Neo4js
- Understanding the nuances
- Hierarchical databases
- Object-oriented databases
- Document-oriented databases
- Graph-oriented databases
- Other
- Relational databases (SQL)
- Distributed Processing
- Hadoop
- HDFS as a distributed filesystem
- MapReduce for distributed processing
- Spark
- All-in-one in-memory cluster computing framework for large-scale data processing
- Structured streaming
- Spark SQL
- Machine Learning libraries: MLlib
- Graph processing with GraphX
- Hadoop
- Scalability
- Public cloud
- AWS, Google, Aliyun, etc.
- Private cloud
- OpenStack, Cloud Foundry, etc.
- Auto-scalability
- Public cloud
Choosing the Right Solution for the Problem
The Future of Big Data
Summary and Conclusion
Pré requis
- A general understanding of math.
- A general understanding of programming.
- A general understanding of databases.
Audience
- Developers / programmers
- IT consultants
Nos Clients témoignent (6)
Dużo cierpliwości
Mateusz - WestWind Energy Polska Sp. z o.o.
Formation - ArcGIS for Spatial Analysis
Le formateur a adapté le matériel et le contenu à ce qu'il pensait être le mieux pour nous et il a réussi. La qualité de la formation était excellente.
Jorge Sanchez Hernandez - CSMART - Carnival
Formation - QGIS for Geographic Information System
Traduction automatique
I genuinely enjoyed the lots of labs and practices.
Vivian Feng - Destination Canada
Formation - Data Analysis with SQL, Python and Spotfire
Professionnel et très pratique, utile dans un travail quotidien
Jozefin Rékasi - SC Automobile Dacia SA
Formation - Advanced Data Analysis with TIBCO Spotfire
Traduction automatique
Il a couvert les domaines qui m'intéressaient avant le cours : les relations entre les données, l'utilisation du script python. La connexion aux bases de données sera couverte dans le module avancé.
Cristian Tudose - SC Automobile Dacia SA
Formation - Introduction to Spotfire
Traduction automatique
Rapide et efficace VM Azure et support au top