Lecture 1 – Understanding the geometry and topology of big biomedical data by Smita Krishnaswamy

(Honestly this was one of the most difficult lectures I’ve ever heard, so my notes on this were VERY messy)

Big biomedical data = lots of observations and efeatures of each of observations

Patient data : units of observations are different patients
- Features : what you’re measuring
FMRI measure proxy for activity of brain reagions (brain mocsuls)

Single cell data (ScRNA-seq)

Breakthrough in biology that now you can individual look at different cells (not slide of tissues or vial of blood
- See all gene, and transcript dome
Single cell resolution can measure chromatin or epigenetic state of every cell

Variance is for one variable and covariance is for two variables

This kind of data requires sophisticated types of analysis : machine learning

Process of identifying patterns in data (paradigm or computational algorithm automatically idefnitifes patterns and data)

Two kinds of machine learning

Supervised learning
1. Already have labels for data points
2. Google/Facebook has thousnands of images, some annotate which are animals or not. Train the computer algorithm to predict the labels you already have on your training data

Show a picture of a strawberry, train the module the features and label of strawberry.

Unsupervised learning
1. No labels
2. Biomedical data (no labels for cells, staff x label them, don’t have annotated data)
3. Often done by simplifying the data
4. Learning embedding/representation of data from high—> low dimensions (low dimensional are easier to interpret)

Image contains all types of hues and thousands of information (the picture in display)—> learns simple representation (embedding values)

Things closer to cherries have closer embedding values

Linear regression is supervised learning because you have the x and y values (fitting a label)

Clustering is unsupervised

Data Matrices and Representations

Different representations of data will help you do different things (for unsupervised data)

Single Cell Data

Each cell is a vector of measurements
Whole data is a matrix with many observations (cells) and features (proteins, genes)

Are the values/coordinates an unsupervised ml in terms of the center of the object, since it’s just one value for both x and y (doesn’t matter, depends on neural network or transform representation in many ways)

Once you think of data points, you can measure how similar data points are to each other by taking Euclidean distance between vectors

Distance can be anything, just a function that has symmetric distance from a to b, non-negative (you can’t walk negative distances), follow triangle inequality

Distances can allow you to represent data as a graph (consists of vertices/networks and edges(connections between vertices))

Similarity matrix/affinity matrix (opposite of distance matrix)

Just pass distance through function
- Higher the distance, the lower the functions
- Functions are called kernel functions
  - You can flip the magnitude (small distance = high similarity/affinity)
- If distance is 0, then affinities could be maximum
Affinity matrices are useful objects but to get to them, usually compete distance matrices first

Affinity matrix still mimics shape of data

Swiss roll on left (very coiled)

These are cells, cells lie in subspace, not all through this face
- Cells might be transitioning like the pathway of the swissroll (intrinsic shape of data, can et intrinsic shape by walking across graph, but harder if its’ high dimensional)

Just by distance matrix, it looks like some points are similar to each other, but there won’t be transition (transition is happening across density of data)

Affinity matrices nicely follow data fold if you construct correctly

Healthy patients or patients getting sicker

Take their distance matrix convert to affinity matrix, you start to see diagonal and off diagonal are most similar

Affinity is the inverse proportional to distance (way of representing data)

Why represent data as a graph?

Graph easy to cluster

more vertices in them rather than between them (look for places to cut graph where not many clues

Look for clusters

Paths through data graphs can represent progression trajectories (cuts of thes graph could represent clusters)

Does increasing number of dimensions of the data increase the accuracy in which it can be read —> yes, but poses challenges (dimensionality reduction)

Neural networks reduce dimensions itself

Thinking about high dimensional data

Why we measure more features: gives us distinctions that we didn’t have

Data set of different grapes that were turned into wines (cultivars of wines)

Lok at different cultivations of wine grapes and their features (how alcoholic they are)
To combine all three of them, look at jointly and cluster them (cluster is look for seperations, and this data isn’t separate)

But add third feature (color intensity), more differential

Data set of different grapes that were turned into wines (cultivars of wines)

Lok at different cultivations of wine grapes and their features (how alcoholic they are)
To combine all three of them, look at jointly and cluster them (cluster is look for seperations, and this data isn’t separate)

But add third feature (color intensity), more differential

Now you can use cluster data (= more information, more accuracy about what you’re looking at)

The more features you look at = the more relationships between features

However, we can only see in 3d, not 20k D

We can only see in 2/3 dimensions, but with 50 dimensions, we can’t put it in our heads

Solution: dimensionality reduction( x throw away features but put data into new dimensions that preserve high dimensional info as much as possible)

Left: how reduce dimension?

Project it to a line (single dimension)
- Use red line, but when you project all the data on the line ,yo udon’ have info on other dimensions

If reconstruct data based on d, you do more accurate job of reconstructing dat (you wouldn’t know variance but does fit data best)
Captures max. Variance
- Data varies MOSt in the direction of D
  - Retains most variance
- Want to preserve direction of maximal variance = gives rise to reduction algorithm PCA

Principle components analysis (PCA)

Have new axis in data (new features, push away old features)
- New features have property that the first new feature is the first most informative (you can drop the later on ones, but keep the first few)

How to find PCA?

Covariance Matrix = take every data point and subtract off the mean of of each column, then you take another column and subtract off the mean
- If one of your features is far away from the mean —> other feature is far away from the mean (variance)
Look to see what the expected value of product

Feature by feature matrix 9how much covariance in each column)

Matrixes of covariances between all features
Off diagonal : covariance between different
Diagonal : covariance between itself

Matrixes to store data, matrices to store distance between data points, now covariance between features

Can also use matrices to store transformations (matrices can be applies to vector to transform vector =linear transformation)

Matrix can rotate vector (line has some magnitude with some spacial coordinate)

Or scale it (grow it)

Matrix-vector notation

Whatever data point you have can be described as vector from origin

Eigenvectors = characteristic vectors,

Not rotated but only stretches

Rotation matrices rotate lines in only one direction
Other matrices can move different lines in diffection directions

They telly ou the direction in which the matrix pulls the most and doesn’t rotate, iand if you list all of them, you fully caracteristze the matrix

Covariance matrix, find eigne cetors, find edirection of covariance matrix stretches the most without rotating = directions of maximum variants

Matrices can be transformation and transformations have important vectors called Eigenvectors (only stretch, x rotate)

Eigenvectors of covariance (as transformation)

Gaussianus ball —> stretched in a direction that mimics actual data

Rotated and pulled that way

So uni hassium ball and apply covariance matrix, recreates structure of Datta —> highest engine value to eggier actors is direction of matrix bearings

Maximum value engine vector : PC1, then PC2

Non-linear structure

PCA is not perfect data since an’t apply to non-linear structure where won’t persevere variance

Non-linear dimensionality reduction

can be coiled/snaky (doesn’t have to be straight lines)
To find non-linear dimensions = have to understand and describe shape of data
- Non linear dimensions tell shape of data (the idea that data has shape : data “manifold” assumption that data has smooth shape)
Manifold : you have a shape where locally, you can model it as a smooth plane

Have assumption that data is coiled in space.

Manifold learning techniques are aimed at uncovering the lower dimensional space that is coiled into the high deminsonal space where you measured data

Differentiation in biology follow a manifold space

Measure cells at different shapes of differentiation : shape of spaces (paddington’s landscape)

You don’t have true manifold, but have data sampled —> helps successful visualization

How to find manifold?
- Remodel data as graph in affinity graph (local connection0
- Nearest neighbor graph( threshold )

Affinity matrix: distance matrix and compute using kernel function

Sigma parameter is what you can play with (correctly creating affinity with distance matrix)

Sigma: deviation of gaussian

Make too wide: ruins point of affinity matrix (looks like distance matrix)

Once you create affinity matrix, you can use it in place of covariance matrix —> not preserving covariance, you’re preserving affinity but can use Eigenvvecotr ad use first few Eigen vector

Laplacian Eigenmap method
Diffusion map

All go by name of kernel pea (use kernel functionto create affinity matrix and use Eigenvector) to have visualization

Diffusion maps:

data, distance matrix, put it through kernel to get affinity matrix, and now local relationships are preserved but you can

Have ordering from 1-0 , can preserve Eigenvectors.

Roadway between data points, able to randomly walking from data point

If you have infinitely many point, mimics heat diffusion

Distance Matrix

Once you use diffusion matrix and power it ,you can find global Collette connections across manifold

PCA, diffusion maps, kernel PCA is changing axis of data, but not directly reducing it to two (using first two of them to visualize, you’re just leaving out the rest), first 10-20 progression, but strictly visualize data, want to find the one that gives you two dimension

Happy to change axis into new axis that ordered by important

How to change into two dimensions and visualize as much as possible

tSNE/UMAP : x use Eigen vector, but goal: look at neighbors of data in high deminosnal space (using affinity matrix)

Use normal affinity matrix to find probabilities

Neighbor with high probability is between the red and blue.

Take low dimensional space, random points and then find place when low dime ion and high dimensional match most as possible

tSNE is good at preserving cluster separations of data, but not good at preserving trajectories, continuance and distance/global placements because it looks at locally and preserves nearest neighbors at every point

PHATE: (based on same represtionatoin as diffusion maps, but since diffusion maps pout each progression in different dimension)

Start with T step diffusion probabilities between every pair of pints

Take affinity matrix, normalize, then take normal walk, then get random walk probabilities

Characterisze data by its t step random walk probablitiy by other data points

Compares probability distribution with another : divergence

New kind of distance is embedded into 2 dimensions with MDS( distance preservation method)

New diantce in low dimension contain all info that wasn’t in diffusiion probability matrix

Divergence and squeenzed into two dimension : high degree of structure preservation

Tstart with human embryonic stem cells to embroidery bodes —> Lineages, neuronal metter, blood cells, etc.

Collect and measure cell over 27 day, each dot is single cell

Put all cells together and let PHATe create visualization)

Slowly differentiating and branching out to different lineages

PCA can find axis of maximum variation and find PC1, but has no ability to distaincuighs ,title branches (-nonlinear coil dimensions)

tSNE only focuses on near neighbors that when there’s sparsity, it shatters , focused on keeping this cluster of cells together (not globally coherent but local clusters okay)

Have to look at 20 dimensions for diffusion maps

Phate shows dtime progression and branching into differ

ent trajectories

Use PHATE to stay cancer

Treat with cancer metastatis fluid, able to swim into body (don’t act like caner cells) and will have to create new tumor

Partially transition cells (very coiled)—> cells differentiate to see secondary tumor —> magnetosphere culture

Cells that successfully transitioned into half epithelia, the right one is final state

New neural network that can learn about dynamics of data

Since measurements are destructive, cells die

Can’t measure entire contents of cell and have it be alive and continue to persist

Using manifold models that follow shape of data to learn dynamics

you could look at arc in data progression but useless if there are gaps in time

Want to connect between points

Transport blob to here to here by optimal transport

Neutral network : ODE network, learns high dimensional ordinary differential equation —> use to construct population flow

Can make deep neural network that gives you deep flow but with ODE, you can make them continuolsou flow -> pads of flow how its sign into next point )penalize plans to give an optimal transport)

Optimaal transport (efficient, could mimic how cancer cells transition)

You can penalize them to do whatever you want as long has differential penalty

Can reanimate C

Ells how they transition

Can look at which genes are turned on and off during metastasis

Take trajectories and recreate the individual genes to see what is happening and where there is deviation between final states

You can go backwards t o see what trends of genes are (proliferate)

Trajectory network : find identity of cells that originate secondary tumor

Use it to find gene regulatory programs (gives pseudo time where gene is in the process)

Auto encoder takes data and goes to lower dimension and tries to reconstruct data back out

SAUCIE

Archetypal Anlysis detects continuous data/transition data

Residual net has 50-60 dimensions
⁃ Something called oilers integrator —> adding the derivative every time
⁃ The depth determine the length of time you’re simulating your differential equation for
⁃ If neural network has infinite depth: make continuous integration —> development of neural ODE

Main Vocabulary
⁃ Mainly focused on biomedical data
⁃ Being able to collect data from human body
⁃ Used a lot of
⁃ Machine learning
⁃ Data representations
⁃ High dimensional data
⁃ She’s collecting a lot of data on same thing (one cell can have a lot of data, whether how active it is, it’s direction)
⁃ How do we make big sets of data
How to measure data of cells
0. ex. The brain
0. FMRI (just tells how active a neuron is based on blood flow in brain)
0. Takes how much blood flow activity in brain and correlate it to your cells

The more information you have, the more dimensions —> more accurate data

How is the appropriate kernel determined/selected?
⁃ P-Hacking: P value is how you show your research is significant
⁃ When researchers look for hacking the p value researchers

Denoising:
⁃ Lots of information that means nothing (= noise)
⁃ In order to collect info you want, you have to denoise (decrease noise levels) either through manipulation of data or recalibration of machine

Lot of points on a graph (she just graphed it regularly —> distance matrix
(Eucleadian method of analysis : how far are the dots from each other)
⁃ Looking at how much they vary

Affinity matrix; graphing how similar are each of the dots?

(Similar to covariance matrix)

Neural networks (artificial, created based on thought of how brain works, active not active, trying to replicate brain or use what we learn from brain to make machines learning) vs. machine learning(deep learning, categorize things and learn from it)

Dimensional: more information

Question 1:
1. Possibly lose the important data/importance of a specific dimension during dimension reduction
2. Being the one who’s manipulating the data, you could manipulate until you get the data you want (to make it seem like its according a certain data – “p hacking”)

__________________________________________________________________________________

My personal opinion: Honestly, this was the most confusing lecture ever.

Not only did the professor explain all the concepts quite quickly, but also, I struggled with understanding the content of the lecture as I had never heard of machine learning.

Therefore, when I discussed with my “family” during family time, I was actually quite relieved to hear that my groupmates also had struggled to understand what the lecture was about, since the topic was quite a difficult one.

I was fascinated by other students in the lecture that were very eager to ask questions during the QnA session, especially very professionalized questions, so I think it was a good chance for me to get motivated.

– Joanna Kim, July 9th, 2021, 3:07 AM KST –

Biochemistry Science Blog!

Lecture 1 – Understanding the geometry and topology of big biomedical data by Smita Krishnaswamy

Leave a comment Cancel reply

Lecture 1 – Understanding the geometry and topology of big biomedical data by Smita Krishnaswamy

Share this:

Leave a comment Cancel reply