(Honestly this was one of the most difficult lectures I’ve ever heard, so my notes on this were VERY messy)
Big biomedical data = lots of observations and efeatures of each of observations
- Patient data : units of observations are different patients
- Features : what you’re measuring
- FMRI measure proxy for activity of brain reagions (brain mocsuls)
Single cell data (ScRNA-seq)
- Breakthrough in biology that now you can individual look at different cells (not slide of tissues or vial of blood
- See all gene, and transcript dome
- Single cell resolution can measure chromatin or epigenetic state of every cell
Variance is for one variable and covariance is for two variables
This kind of data requires sophisticated types of analysis : machine learning
- Process of identifying patterns in data (paradigm or computational algorithm automatically idefnitifes patterns and data)
Two kinds of machine learning
- Supervised learning
- Already have labels for data points
- Google/Facebook has thousnands of images, some annotate which are animals or not. Train the computer algorithm to predict the labels you already have on your training data
Show a picture of a strawberry, train the module the features and label of strawberry.
- Unsupervised learning
- No labels
- Biomedical data (no labels for cells, staff x label them, don’t have annotated data)
- Often done by simplifying the data
- Learning embedding/representation of data from high—> low dimensions (low dimensional are easier to interpret)
Image contains all types of hues and thousands of information (the picture in display)—> learns simple representation (embedding values)
Things closer to cherries have closer embedding values
Linear regression is supervised learning because you have the x and y values (fitting a label)
Clustering is unsupervised
Data Matrices and Representations
- Different representations of data will help you do different things (for unsupervised data)
Single Cell Data
- Each cell is a vector of measurements
- Whole data is a matrix with many observations (cells) and features (proteins, genes)
Are the values/coordinates an unsupervised ml in terms of the center of the object, since it’s just one value for both x and y (doesn’t matter, depends on neural network or transform representation in many ways)
Once you think of data points, you can measure how similar data points are to each other by taking Euclidean distance between vectors
Distance can be anything, just a function that has symmetric distance from a to b, non-negative (you can’t walk negative distances), follow triangle inequality
Distances can allow you to represent data as a graph (consists of vertices/networks and edges(connections between vertices))
Similarity matrix/affinity matrix (opposite of distance matrix)
- Just pass distance through function
- Higher the distance, the lower the functions
- Functions are called kernel functions
- You can flip the magnitude (small distance = high similarity/affinity)
- If distance is 0, then affinities could be maximum
- Affinity matrices are useful objects but to get to them, usually compete distance matrices first
Affinity matrix still mimics shape of data
Swiss roll on left (very coiled)
- These are cells, cells lie in subspace, not all through this face
- Cells might be transitioning like the pathway of the swissroll (intrinsic shape of data, can et intrinsic shape by walking across graph, but harder if its’ high dimensional)
Just by distance matrix, it looks like some points are similar to each other, but there won’t be transition (transition is happening across density of data)
Affinity matrices nicely follow data fold if you construct correctly
Healthy patients or patients getting sicker
Take their distance matrix convert to affinity matrix, you start to see diagonal and off diagonal are most similar
Affinity is the inverse proportional to distance (way of representing data)
Why represent data as a graph?
- Graph easy to cluster
more vertices in them rather than between them (look for places to cut graph where not many clues
- Look for clusters
Paths through data graphs can represent progression trajectories (cuts of thes graph could represent clusters)
Does increasing number of dimensions of the data increase the accuracy in which it can be read —> yes, but poses challenges (dimensionality reduction)
Neural networks reduce dimensions itself
Thinking about high dimensional data
- Why we measure more features: gives us distinctions that we didn’t have
Data set of different grapes that were turned into wines (cultivars of wines)
- Lok at different cultivations of wine grapes and their features (how alcoholic they are)
- To combine all three of them, look at jointly and cluster them (cluster is look for seperations, and this data isn’t separate)
But add third feature (color intensity), more differential
Data set of different grapes that were turned into wines (cultivars of wines)
- Lok at different cultivations of wine grapes and their features (how alcoholic they are)
- To combine all three of them, look at jointly and cluster them (cluster is look for seperations, and this data isn’t separate)
But add third feature (color intensity), more differential
Now you can use cluster data (= more information, more accuracy about what you’re looking at)
The more features you look at = the more relationships between features
However, we can only see in 3d, not 20k D
- We can only see in 2/3 dimensions, but with 50 dimensions, we can’t put it in our heads
Solution: dimensionality reduction( x throw away features but put data into new dimensions that preserve high dimensional info as much as possible)
Left: how reduce dimension?
- Project it to a line (single dimension)
- Use red line, but when you project all the data on the line ,yo udon’ have info on other dimensions
- If reconstruct data based on d, you do more accurate job of reconstructing dat (you wouldn’t know variance but does fit data best)
- Captures max. Variance
- Data varies MOSt in the direction of D
- Retains most variance
- Want to preserve direction of maximal variance = gives rise to reduction algorithm PCA
- Data varies MOSt in the direction of D
Principle components analysis (PCA)
- Have new axis in data (new features, push away old features)
- New features have property that the first new feature is the first most informative (you can drop the later on ones, but keep the first few)
How to find PCA?
- Covariance Matrix = take every data point and subtract off the mean of of each column, then you take another column and subtract off the mean
- If one of your features is far away from the mean —> other feature is far away from the mean (variance)
- Look to see what the expected value of product
Feature by feature matrix 9how much covariance in each column)
- Matrixes of covariances between all features
- Off diagonal : covariance between different
- Diagonal : covariance between itself
Matrixes to store data, matrices to store distance between data points, now covariance between features
Can also use matrices to store transformations (matrices can be applies to vector to transform vector =linear transformation)
Matrix can rotate vector (line has some magnitude with some spacial coordinate)
Or scale it (grow it)
Matrix-vector notation
Whatever data point you have can be described as vector from origin
Eigenvectors = characteristic vectors,
Not rotated but only stretches
- Rotation matrices rotate lines in only one direction
- Other matrices can move different lines in diffection directions
They telly ou the direction in which the matrix pulls the most and doesn’t rotate, iand if you list all of them, you fully caracteristze the matrix
Covariance matrix, find eigne cetors, find edirection of covariance matrix stretches the most without rotating = directions of maximum variants
- Matrices can be transformation and transformations have important vectors called Eigenvectors (only stretch, x rotate)
Eigenvectors of covariance (as transformation)
Gaussianus ball —> stretched in a direction that mimics actual data
- Rotated and pulled that way
So uni hassium ball and apply covariance matrix, recreates structure of Datta —> highest engine value to eggier actors is direction of matrix bearings
- Maximum value engine vector : PC1, then PC2
Non-linear structure
PCA is not perfect data since an’t apply to non-linear structure where won’t persevere variance
Non-linear dimensionality reduction
- can be coiled/snaky (doesn’t have to be straight lines)
- To find non-linear dimensions = have to understand and describe shape of data
- Non linear dimensions tell shape of data (the idea that data has shape : data “manifold” assumption that data has smooth shape)
- Manifold : you have a shape where locally, you can model it as a smooth plane
Have assumption that data is coiled in space.
Manifold learning techniques are aimed at uncovering the lower dimensional space that is coiled into the high deminsonal space where you measured data
Differentiation in biology follow a manifold space
Measure cells at different shapes of differentiation : shape of spaces (paddington’s landscape)
You don’t have true manifold, but have data sampled —> helps successful visualization
- How to find manifold?
- Remodel data as graph in affinity graph (local connection0
- Nearest neighbor graph( threshold )
Affinity matrix: distance matrix and compute using kernel function
Sigma parameter is what you can play with (correctly creating affinity with distance matrix)
Sigma: deviation of gaussian
Make too wide: ruins point of affinity matrix (looks like distance matrix)
Once you create affinity matrix, you can use it in place of covariance matrix —> not preserving covariance, you’re preserving affinity but can use Eigenvvecotr ad use first few Eigen vector
- Laplacian Eigenmap method
- Diffusion map
All go by name of kernel pea (use kernel functionto create affinity matrix and use Eigenvector) to have visualization
Diffusion maps:
data, distance matrix, put it through kernel to get affinity matrix, and now local relationships are preserved but you can
- Have ordering from 1-0 , can preserve Eigenvectors.
Roadway between data points, able to randomly walking from data point
- If you have infinitely many point, mimics heat diffusion
Distance Matrix
Once you use diffusion matrix and power it ,you can find global Collette connections across manifold
PCA, diffusion maps, kernel PCA is changing axis of data, but not directly reducing it to two (using first two of them to visualize, you’re just leaving out the rest), first 10-20 progression, but strictly visualize data, want to find the one that gives you two dimension
Happy to change axis into new axis that ordered by important
How to change into two dimensions and visualize as much as possible
- tSNE/UMAP : x use Eigen vector, but goal: look at neighbors of data in high deminosnal space (using affinity matrix)
Use normal affinity matrix to find probabilities
Neighbor with high probability is between the red and blue.
Take low dimensional space, random points and then find place when low dime ion and high dimensional match most as possible
tSNE is good at preserving cluster separations of data, but not good at preserving trajectories, continuance and distance/global placements because it looks at locally and preserves nearest neighbors at every point
PHATE: (based on same represtionatoin as diffusion maps, but since diffusion maps pout each progression in different dimension)
Start with T step diffusion probabilities between every pair of pints
Take affinity matrix, normalize, then take normal walk, then get random walk probabilities
Characterisze data by its t step random walk probablitiy by other data points
Compares probability distribution with another : divergence
New kind of distance is embedded into 2 dimensions with MDS( distance preservation method)
New diantce in low dimension contain all info that wasn’t in diffusiion probability matrix
Divergence and squeenzed into two dimension : high degree of structure preservation
Tstart with human embryonic stem cells to embroidery bodes —> Lineages, neuronal metter, blood cells, etc.
Collect and measure cell over 27 day, each dot is single cell
- Put all cells together and let PHATe create visualization)
Slowly differentiating and branching out to different lineages
- PCA can find axis of maximum variation and find PC1, but has no ability to distaincuighs ,title branches (-nonlinear coil dimensions)
tSNE only focuses on near neighbors that when there’s sparsity, it shatters , focused on keeping this cluster of cells together (not globally coherent but local clusters okay)
Have to look at 20 dimensions for diffusion maps
Phate shows dtime progression and branching into differ
ent trajectories
Use PHATE to stay cancer
Treat with cancer metastatis fluid, able to swim into body (don’t act like caner cells) and will have to create new tumor
Partially transition cells (very coiled)—> cells differentiate to see secondary tumor —> magnetosphere culture
Cells that successfully transitioned into half epithelia, the right one is final state
New neural network that can learn about dynamics of data
Since measurements are destructive, cells die
Can’t measure entire contents of cell and have it be alive and continue to persist
Using manifold models that follow shape of data to learn dynamics
you could look at arc in data progression but useless if there are gaps in time
Want to connect between points
Transport blob to here to here by optimal transport
Neutral network : ODE network, learns high dimensional ordinary differential equation —> use to construct population flow
Can make deep neural network that gives you deep flow but with ODE, you can make them continuolsou flow -> pads of flow how its sign into next point )penalize plans to give an optimal transport)
Optimaal transport (efficient, could mimic how cancer cells transition)
You can penalize them to do whatever you want as long has differential penalty
Can reanimate C
Ells how they transition
Can look at which genes are turned on and off during metastasis
Take trajectories and recreate the individual genes to see what is happening and where there is deviation between final states
You can go backwards t o see what trends of genes are (proliferate)
Trajectory network : find identity of cells that originate secondary tumor
Use it to find gene regulatory programs (gives pseudo time where gene is in the process)
Auto encoder takes data and goes to lower dimension and tries to reconstruct data back out
SAUCIE
Archetypal Anlysis detects continuous data/transition data
Residual net has 50-60 dimensions
⁃ Something called oilers integrator —> adding the derivative every time
⁃ The depth determine the length of time you’re simulating your differential equation for
⁃ If neural network has infinite depth: make continuous integration —> development of neural ODE
Main Vocabulary
⁃ Mainly focused on biomedical data
⁃ Being able to collect data from human body
⁃ Used a lot of
⁃ Machine learning
⁃ Data representations
⁃ High dimensional data
⁃ She’s collecting a lot of data on same thing (one cell can have a lot of data, whether how active it is, it’s direction)
⁃ How do we make big sets of data
How to measure data of cells
0. ex. The brain
0. FMRI (just tells how active a neuron is based on blood flow in brain)
0. Takes how much blood flow activity in brain and correlate it to your cells
The more information you have, the more dimensions —> more accurate data
How is the appropriate kernel determined/selected?
⁃ P-Hacking: P value is how you show your research is significant
⁃ When researchers look for hacking the p value researchers
Denoising:
⁃ Lots of information that means nothing (= noise)
⁃ In order to collect info you want, you have to denoise (decrease noise levels) either through manipulation of data or recalibration of machine
Lot of points on a graph (she just graphed it regularly —> distance matrix
(Eucleadian method of analysis : how far are the dots from each other)
⁃ Looking at how much they vary
Affinity matrix; graphing how similar are each of the dots?
(Similar to covariance matrix)
Neural networks (artificial, created based on thought of how brain works, active not active, trying to replicate brain or use what we learn from brain to make machines learning) vs. machine learning(deep learning, categorize things and learn from it)
Dimensional: more information
Question 1:
1. Possibly lose the important data/importance of a specific dimension during dimension reduction
2. Being the one who’s manipulating the data, you could manipulate until you get the data you want (to make it seem like its according a certain data – “p hacking”)
__________________________________________________________________________________
My personal opinion: Honestly, this was the most confusing lecture ever.
Not only did the professor explain all the concepts quite quickly, but also, I struggled with understanding the content of the lecture as I had never heard of machine learning.
Therefore, when I discussed with my “family” during family time, I was actually quite relieved to hear that my groupmates also had struggled to understand what the lecture was about, since the topic was quite a difficult one.
I was fascinated by other students in the lecture that were very eager to ask questions during the QnA session, especially very professionalized questions, so I think it was a good chance for me to get motivated.
– Joanna Kim, July 9th, 2021, 3:07 AM KST –