Big Data Analytics
BDA theory and reference
Syllabus
| Module | Detailed Content | Hrs | CO |
|---|---|---|---|
| 1 | Introduction to Big Data and Hadoop Ecosystem: Big Data characteristics and Types of Big Data, Traditional vs. Big Data business approach, Case Study of Big Data Solutions, Core Hadoop Components; Hadoop Ecosystem | 05 | CO1 |
| 2 | MapReduce Framework and Algorithms for Big Data Processing: MapReduce: The Map Tasks, Grouping by Key, The Reduce Tasks, Combiners, Details of MapReduce Execution (📍Uploaded Material Covers content till here), Coping with Node Failures. Algorithms Using MapReduce: Matrix-Vector Multiplication by MapReduce, Relational-Algebra Operations, Computing Selections by MapReduce, Computing Projections by MapReduce, Union, Intersection, and Difference by MapReduce, Hadoop Limitations. | 09 | CO2 |
| 3 | NoSQL: Introduction to NoSQL, NoSQL Business Drivers NoSQL Data Architecture Patterns: Key-value stores, Graph stores, Column family (Bigtable) stores, Document stores, Variations of NoSQL architectural patterns, NoSQL Case Study, NoSQL solution for big data, Understanding the types of big data problems; Analyzing big data with a shared-nothing architecture. | 07 | CO3 |
| 4 | Mining Data Streams: A Data-Stream-Management System, Examples of Stream Sources, Stream Queries, Issues in Stream Processing. Sampling Data techniques in a Stream Filtering Streams: Bloom Filter with Analysis. Counting Distinct Elements in a Stream: Flajolet-Martin Algorithm, Counting Ones in a Window: The Cost of Exact Counts, The Datar-Gionis-Indyk-Motwani Algorithm, Query Answering in the DGIM Algorithm, Decaying Windows. | 08 | CO4 |
| 5 | Handling Larger Datasets: Frequent Itemset Mining: Algorithm of Park, Chen, and Yu (PCY) Finding Similar Items: Applications of Near-Neighbor Search, Jaccard Similarity of Sets, Similarity of Documents Distance Measures: Definition of a Distance Measure, Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance. Clustering: CURE Algorithm. | 08 | CO5 |
| 6 | Big Data Models to Social Networking: A Model for Recommendation Systems, Content-Based Recommendations, Collaborative Filtering, Case Study: Similar Product Recommendation Social Networks as Graphs, Clustering of Social-Network Graphs, Direct Discovery of Communities in a social graph. | 08 | CO6 |
Theory Notes
The notes are provided in MyDY Portal. Open in MyDY to view.
Below are some extra notes not provided in MyDY. These were sent on WhatsApp by the faculty on the official group.
Module 1
Introduction to Big Data & Hadoop
Module 1 • Complete Module 1
Module 2
Hadoop HDFS & MapReduce
Module 2 • The Map Tasks, Grouping by Key, The Reduce Tasks, Combiners, Details of MapReduce Execution
Question Banks
Internal Assessment 1
Modules 1 • Modules 2 (MSE Question bank + Weekly question bank without questions from Module 3 onwards)
Mid-Semester Exam
Refer Internal Assessment 1 Question Bank for Q1-17. This contains Q18-28
BDA IA2 Resources
Topic-wise OCR resources for IA2 (Modules 4-5 focus).
End-Semester Exam Notes
Modules 1 • Modules 2 • Modules 3 • Modules 4 • Modules 5 • Modules 6