[Seurat] Guided Clustering Tutorial, Single cell data analysis by R studio

안녕하세요, 꿈꾸는 약사입니다.

이번에는 R을 이용한 Single cell analysis를 seurat library를 이용해서 진행하는 tutorial에 대해 공부해보겠습니다.

준비물 : R, R studio, R tools(Windows 사용자만) 설치

Seurat Object 설정하기

# 먼저 필요한 library 불러옴

library(dplyr)
library(Seurat)
library(patchwork)

# PBMC dataset 로딩해주기. 경로 설정 시 ＼＼혹은 /로 표기할 것. Read10X의 X는 large X에 유의

pbmc.data <- Read10X(data.dir="D:/My Documents/Downloads/pbmc3k_filtered_gene_bc_matrices/filtered_gene_bc_matrices/hg19/")

# raw data를 seurat object로 선언

pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc

#실행 결과

Standard pre-processing workflow

# Seurat의 scRNA-seq data에 대한 전처리과정 workflow

# 분석 가치가 있는 cell에 대한 data만 Quality Control 하는 과정

QC and selecting cells for further analysis

# pbmc data에 MT로 시작하는 seq의 RNA 비율을 정의하여, percent.mt로 명명

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

# QC metrics를 violin plot으로 가시화VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

# 실행 결과

# FeatureScatter 통해서 feature간 관계를 plotting

plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2

# 실행 결과

# 부적절한 수의 nFeature_RNA, percent.mt 값을 갖는 pbmc sc subset들 제외한 값을 rename

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

Normalizing the data

# LogNormalize 이용하여 feature expression 측정값을 total expression에 대해 normalize

pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)

Identification of highly variable features (feature selection)

# Cell이 expression하는 RNA seq 중 variation이 높은 RNA data만을 선택적으로 찾아냄

# 각 Dataset별로 2000개의 features를 return

pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)

# 가장 variable한 상위 10개의 RNA를 top10으로 명명

top10 <- head(VariableFeatures(pbmc), 10)

# plot variable features with and without labels

plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2

Scaling the data

# Linear transformation을 통해서 dimensional reduction 해주기 위한 전처리과정

# 각 유전자에 대한 발현값으로 모든 cell에 대한 평균이 0, 최대 variance는 1인 ScaleData() 선언

all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)

저작자표시 비영리 변경금지 (새창열림)

'생물정보학 > Seurat' 카테고리의 다른 글

[Seurat] Guided Clustering Tutorial, Single cell data analysis by R studio 4편 - Visualization (0)	2022.08.23
[Seurat] Guided Clustering Tutorial, Single cell data analysis by R studio 3편 (0)	2022.08.18
[Seurat] Guided Clustering Tutorial, Single cell data analysis by R studio 2편 (0)	2022.08.17