DISCA - Deep Iterative Subtomogram Clustering and Averaging

Unsupervised structure discovery in Cryo-ET data using deep learning.

Overview

DISCA discovers macromolecular structures in cryo-electron tomography (Cryo-ET) data without requiring ground truth labels. It uses an EM-style iterative approach combining:

3D CNN Feature Extractor: Learns discriminative features from subtomogram volumes
YOPO Clustering: You Only Propagate Once - efficient clustering with single forward pass
GMM/K-means: Soft or hard cluster assignments

Project Structure

DISCA-discovery/
├── config/
│   └── config.yaml          # Hyperparameters and settings
├── scripts/
│   ├── train.py             # Training script
│   ├── evaluate.py          # Evaluation script
│   └── preprocess_tomograms.py  # Tomogram preprocessing
├── src/
│   ├── data/
│   │   └── subtomogram_loader.py  # Data loading
│   ├── models/
│   │   ├── feature_extractor.py   # 3D CNN encoder
│   │   └── clustering.py          # YOPO clustering
│   ├── training/
│   │   └── disca_trainer.py       # Training loop
│   └── utils/
│       ├── metrics.py             # Evaluation metrics
│       └── visualization.py       # Plotting utilities
├── requirements.txt
├── COLAB_GUIDE.md           # Google Colab instructions
└── README.md

Quick Start

Installation

pip install -r requirements.txt

Training

python scripts/train.py --config config/config.yaml --data_dir data/subtomograms

Evaluation

python scripts/evaluate.py --checkpoint outputs/checkpoints/best_model.pth --visualize

Workflow

1. Preprocess Tomogram

Extract subtomograms from a tomogram using particle picking:

python scripts/preprocess_tomograms.py \
    --input tomogram.mrc \
    --output data/subtomograms \
    --box-size 32 \
    --particle-pick \
    --threshold 1.5

Or using sliding window:

python scripts/preprocess_tomograms.py \
    --input tomogram.mrc \
    --output data/subtomograms \
    --box-size 32 \
    --stride 16

2. Train Model

python scripts/train.py \
    --config config/config.yaml \
    --data_dir data/subtomograms \
    --num_epochs 30 \
    --batch_size 32

3. Evaluate Results

python scripts/evaluate.py \
    --checkpoint outputs/checkpoints/best_model.pth \
    --data_dir data/subtomograms \
    --visualize \
    --output_dir outputs/evaluation

Configuration

Key parameters in config/config.yaml:

Parameter	Description	Default
`model.feature_dim`	Feature embedding dimension	128
`clustering.num_clusters`	Number of clusters to discover	10
`clustering.method`	Clustering algorithm (gmm/kmeans)	gmm
`training.learning_rate`	Learning rate	0.0001
`training.warmup_epochs`	Epochs before clustering starts	3
`training.batch_size`	Batch size	32

Training Details

EM-Style Iteration

Each epoch consists of:

E-step: Update cluster assignments using GMM/K-means
M-step: Update feature extractor with clustering loss

YOPO Principle

Single forward pass per iteration
Features reused for both clustering and loss computation
50% faster than traditional deep clustering

Stability Features

Warmup phase to establish feature diversity
L2 feature normalization
Variance and separation regularization
Gradient clipping

Metrics

Unsupervised (no ground truth needed)

Silhouette Score: Cluster cohesion vs separation (-1 to 1, higher is better)
Davies-Bouldin Index: Cluster similarity (lower is better)
Calinski-Harabasz Index: Cluster dispersion (higher is better)
Cluster Balance: Distribution uniformity (0 to 1)

Supervised (if labels available)

ARI: Adjusted Rand Index
NMI: Normalized Mutual Information
Purity: Cluster purity score

Google Colab

See COLAB_GUIDE.md for step-by-step instructions to run on Google Colab with free GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DISCA - Deep Iterative Subtomogram Clustering and Averaging

Overview

Project Structure

Quick Start

Installation

Training

Evaluation

Workflow

1. Preprocess Tomogram

2. Train Model

3. Evaluate Results

Configuration

Training Details

EM-Style Iteration

YOPO Principle

Stability Features

Metrics

Unsupervised (no ground truth needed)

Supervised (if labels available)

Google Colab

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
scripts		scripts
src		src
.gitignore		.gitignore
COLAB_GUIDE.md		COLAB_GUIDE.md
README.md		README.md
REPORT.md		REPORT.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DISCA - Deep Iterative Subtomogram Clustering and Averaging

Overview

Project Structure

Quick Start

Installation

Training

Evaluation

Workflow

1. Preprocess Tomogram

2. Train Model

3. Evaluate Results

Configuration

Training Details

EM-Style Iteration

YOPO Principle

Stability Features

Metrics

Unsupervised (no ground truth needed)

Supervised (if labels available)

Google Colab

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages