# Data Warehouse and Mining

*A data warehouse as the name says is a collection of databases that work together. Distributed databases are used to store a database at multiple computer sites to improve data access and processing. After the data is available at various databases, we apply algorithms to access these data from databases. Data mining is the process of analyzing data and summarizing it to produce useful information. Mining of data hence plays a vital role to access, analyze and process this data from the warehouse.*

# Video Lectures

### APRIORI ALGORITHM

*The Apriori Algorithm is an algorithm for mining frequent itemsets for Boolean association rules. Apriori uses a “bottom up” approach, where frequent subsets are extended one item at a time which is known as candidate generation, and groups of candidates are tested against the data.*

### NAIVE BAYES CLASSIFIER

*Bayesian classifiers are the statistical classifiers. Bayesian classifiers can predict class membership probabilities such as the probability that a given tuple belongs to a particular class. Uses prior probability of each category given no information about an item. Categorization produces a posterior probability distribution over the possible categories given a description of an item.*

### HIERARCHICAL AGGLOMERATIVE CLUSTERING [HAC] SINGLE LINK

### HIERARCHICAL AGGLOMERATIVE CLUSTERING [HAC] Average Link

*HAC starts with one cluster, individual item in its own cluster and iteratively merge clusters until all the items belong to one cluster. Bottom up approach is followed to merge the clusters together. Dendrograms are pictorially used to represent the HAC.*

### HIERARCHICAL AGGLOMERATIVE CLUSTERING [HAC] Complete Link

*HAC starts with one cluster, individual item in its own cluster and iteratively merge clusters until all the items belong to one cluster. Bottom up approach is followed to merge the clusters together. Dendrograms are pictorially used to represent the HAC.*

### K-MEAN CLUSTERING

*Exploratory data analysis technique. Implements non hierarchical method of grouping objects together. Determines the centroid using the Euclidean method for distance calculation. Groups the objects based on minimum distance.*

### K-MEAN CLUSTERING [SINGLE DATASET]

*Exploratory data analysis technique. Implements non hierarchical method of grouping objects together. Determines the centroid using the Euclidean method for distance calculation. Groups the objects based on minimum distance.*