Cluster Analysis in Research

June 05, 2025

Cluster Analysis is a statistical technique used in research to group objects, individuals, or variables into clusters (groups) that are internally homogeneous (similar) and externally heterogeneous (different from other groups). It is widely used in disciplines such as marketing, finance, social science, healthcare, and education for pattern discovery and segmentation.

Definition

Cluster analysis is an unsupervised learning method that identifies natural groupings within a dataset based on the similarity or distance between data points.

Purpose of Cluster Analysis

· To discover patterns in data without prior labeling

· To segment populations, such as:

o Investors based on risk preference

o Customers based on buying behavior

o Students based on learning styles

· To reduce data complexity for modeling or visualization

· To support targeted decision-making

Types of Cluster Analysis

Method	Description	Example Use Case
Hierarchical Clustering	Builds a tree (dendrogram) of clusters	Classifying mutual funds by risk level
K-Means Clustering	Partitions data into k pre-defined clusters	Grouping investors by portfolio preference
DBSCAN	Density-based clustering for irregular shapes and noise	Detecting fraud clusters in transactions
Two-Step Clustering	Combines hierarchical and k-means; handles large datasets	Survey respondent segmentation

Steps in Cluster Analysis

1. Define objectives (e.g., group retail investors by behavior)

2. Select variables/features (e.g., age, income, risk appetite, investment type)

3. Standardize data (especially when using mixed units)

4. Choose clustering method (hierarchical, k-means, etc.)

5. Determine number of clusters (using elbow method, silhouette score, etc.)

6. Run clustering algorithm

7. Interpret and validate clusters

8. Label and profile clusters for decision-making

Example in Finance Research

Research Topic: “Segmenting Retail Investors Based on Investment Behavior”

Variables Used	Type
Age	Numeric
Monthly Investment Amount	Numeric
Preferred Asset Type	Categorical
Risk Appetite	Ordinal
Digital Platform Use	Binary

Result: 3 Clusters Identified

Cluster	Description
1	Young, tech-savvy, high-risk investors
2	Middle-aged, moderate investors
3	Retired, conservative, fixed-income focused

Tools and Software

· SPSS (Two-Step Cluster)

· R (cluster, factoextra packages)

· Python (scikit-learn, seaborn, matplotlib)

· Excel (basic distance matrix + dendrograms)

· Tableau or Power BI (for visualization)

Advantages

· No need for predefined categories

· Reveals hidden structures in data

· Supports market segmentation, persona creation, and custom targeting

Limitations

· Sensitive to scaling and outliers

· Results depend on method and parameter choice

· No “correct” number of clusters; requires judgment and validation

Applications in Different Fields

Field	Example of Cluster Use
Finance	Grouping investors, classifying firms by performance
Marketing	Customer segmentation, product recommendation
Healthcare	Patient categorization based on symptoms or demographics
Education	Grouping students by learning style or engagement level

Search This Blog

Research methodology basics