ESE begin 27 April 2026. View Timetable
Logo

Big Data Analytics

BDA theory and reference

Syllabus

ModuleDetailed ContentHrsCO
1Introduction to Big Data and Hadoop Ecosystem: Big Data characteristics and Types of Big Data, Traditional vs. Big Data business approach, Case Study of Big Data Solutions, Core Hadoop Components; Hadoop Ecosystem05CO1
2MapReduce Framework and Algorithms for Big Data Processing: MapReduce: The Map Tasks, Grouping by Key, The Reduce Tasks, Combiners, Details of MapReduce Execution (📍Uploaded Material Covers content till here), Coping with Node Failures. Algorithms Using MapReduce: Matrix-Vector Multiplication by MapReduce, Relational-Algebra Operations, Computing Selections by MapReduce, Computing Projections by MapReduce, Union, Intersection, and Difference by MapReduce, Hadoop Limitations.09CO2
3NoSQL: Introduction to NoSQL, NoSQL Business Drivers NoSQL Data Architecture Patterns: Key-value stores, Graph stores, Column family (Bigtable) stores, Document stores, Variations of NoSQL architectural patterns, NoSQL Case Study, NoSQL solution for big data, Understanding the types of big data problems; Analyzing big data with a shared-nothing architecture.07CO3
4Mining Data Streams: A Data-Stream-Management System, Examples of Stream Sources, Stream Queries, Issues in Stream Processing. Sampling Data techniques in a Stream Filtering Streams: Bloom Filter with Analysis. Counting Distinct Elements in a Stream: Flajolet-Martin Algorithm, Counting Ones in a Window: The Cost of Exact Counts, The Datar-Gionis-Indyk-Motwani Algorithm, Query Answering in the DGIM Algorithm, Decaying Windows.08CO4
5Handling Larger Datasets: Frequent Itemset Mining: Algorithm of Park, Chen, and Yu (PCY) Finding Similar Items: Applications of Near-Neighbor Search, Jaccard Similarity of Sets, Similarity of Documents Distance Measures: Definition of a Distance Measure, Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance. Clustering: CURE Algorithm.08CO5
6Big Data Models to Social Networking: A Model for Recommendation Systems, Content-Based Recommendations, Collaborative Filtering, Case Study: Similar Product Recommendation Social Networks as Graphs, Clustering of Social-Network Graphs, Direct Discovery of Communities in a social graph.08CO6

Theory Notes

The notes are provided in MyDY Portal. Open in MyDY to view.


Below are some extra notes not provided in MyDY. These were sent on WhatsApp by the faculty on the official group.


Module 1

Introduction to Big Data & Hadoop

Module 1 • Complete Module 1

Curated Notes

Module 2

Hadoop HDFS & MapReduce

Module 2 • The Map Tasks, Grouping by Key, The Reduce Tasks, Combiners, Details of MapReduce Execution


Question Banks

Internal Assessment 1

Modules 1 • Modules 2 (MSE Question bank + Weekly question bank without questions from Module 3 onwards)

Mid-Semester Exam

Refer Internal Assessment 1 Question Bank for Q1-17. This contains Q18-28

BDA IA2 Resources

Topic-wise OCR resources for IA2 (Modules 4-5 focus).

End-Semester Exam Notes

Modules 1 • Modules 2 • Modules 3 • Modules 4 • Modules 5 • Modules 6


On this page