R,如何在数据框上循环执行anova?(R, how to do anova in a loop over dataframe?)

我想做2ways anova,并存储p值,并比tukey hsd,但我有一个初始表的问题。 并不总是我有完整的数据,所以并不总是有可能穿孔anova,我不知道如何做到这一点,所以我的脚本运行,而不是跳过不完整的数据,并进一步runns。 我的数据如下所示:

https://filebin.net/w5cfuwztae7yk747

在链接中有两个Accessions的例子,但在实际数据中有3013个种质,其中一些没有所有光照条件或所有基因型

67822 AT2G41680 f HL_f_Dejan58 1.240108e+06 HL AT2G41680 f 70136 AT2G41680 f HL_f_Dejan_61 3.384010e+06 HL AT2G41680 f 72450 AT2G41680 ntrc HL_ntrc_ Dejan_62 1.410768e+05 HL AT2G41680 ntrc 74764 AT2G41680 ntrc HL_ntrc_Dejan_66 5.642197e+00 HL AT2G41680 ntrc 77078 AT2G41680 ntrc HL_ntrc_Dejan65 3.921952e+05 HL AT2G41680 ntrc 78997 AT2G41680 WT LL_WT_Dejan_41 1.016001e+07 LL AT2G41680 WT 81433 AT2G41680 WT LL_WT_Dejan_43 9.320892e+06 LL AT2G41680 WT 83869 AT2G41680 WT LL_WT_Dejan_49 8.560308e+06 LL AT2G41680 WT

有4种基因型和4种光照条件我试图做这样的事情:

AOV<- data.frame() IDs<- unique(Dejan_all_new_norm$Accession) for (i in 1 : length(IDs)){ temp<-Dejan_all_new_norm[(Dejan_all_new_norm$Accession)==IDs[i],] aov2<-aov(value ~ genotype + Light + genotype:Light, data = temp) AOV <- rbind(as.character(unique(IDs[i])),aov2,AOV) }

所以我想要每个基因(加入)的子集,而不是ANOVA,但在此之后,我想做tukey有这样的事情:

$`genotype:Light` diff lwr upr p adj m:FL-f:FL -7324259.81 -16715470 2066950.5 0.3486778 ntrc:FL-f:FL 1662873.54 -7728337 11054083.9 0.9999998 WT:FL-f:FL -5219263.59 -13913835 3475307.7 0.7927417 f:HL-f:FL -4936680.12 -13871535 3998174.3 0.8796738 m:HL-f:FL -7389937.49 -16324792 1544916.9 0.2496858 ntrc:HL-f:FL -7122962.46 -16057817 1811891.9 0.3102106

我想工作在这个简单的循环上,这是我的例子,因为它似乎是最简单的方法。 我会感谢任何帮助!

I would like to do 2ways anova, and store the p value and than do tukey hsd, but i have a problem with the initial table. Not always I have full data, so not always it is possible to perfors anova, I dont know how to do this so my script runs, than skip the not full data and runns further. my data looks like this:

https://filebin.net/w5cfuwztae7yk747

in the link there is example with two Accessions, but in real data there is 3013 accessions and some of them dont have all light conditions or all genotypes

67822 AT2G41680 f HL_f_Dejan58 1.240108e+06 HL AT2G41680 f 70136 AT2G41680 f HL_f_Dejan_61 3.384010e+06 HL AT2G41680 f 72450 AT2G41680 ntrc HL_ntrc_ Dejan_62 1.410768e+05 HL AT2G41680 ntrc 74764 AT2G41680 ntrc HL_ntrc_Dejan_66 5.642197e+00 HL AT2G41680 ntrc 77078 AT2G41680 ntrc HL_ntrc_Dejan65 3.921952e+05 HL AT2G41680 ntrc 78997 AT2G41680 WT LL_WT_Dejan_41 1.016001e+07 LL AT2G41680 WT 81433 AT2G41680 WT LL_WT_Dejan_43 9.320892e+06 LL AT2G41680 WT 83869 AT2G41680 WT LL_WT_Dejan_49 8.560308e+06 LL AT2G41680 WT

there is 4 genotypes, and four light conditions I am trying to do something like this:

AOV<- data.frame() IDs<- unique(Dejan_all_new_norm$Accession) for (i in 1 : length(IDs)){ temp<-Dejan_all_new_norm[(Dejan_all_new_norm$Accession)==IDs[i],] aov2<-aov(value ~ genotype + Light + genotype:Light, data = temp) AOV <- rbind(as.character(unique(IDs[i])),aov2,AOV) }

so i want to subset each gene (Accession) and than do ANOVA, but after this i want do tukey to have something like this:

$`genotype:Light` diff lwr upr p adj m:FL-f:FL -7324259.81 -16715470 2066950.5 0.3486778 ntrc:FL-f:FL 1662873.54 -7728337 11054083.9 0.9999998 WT:FL-f:FL -5219263.59 -13913835 3475307.7 0.7927417 f:HL-f:FL -4936680.12 -13871535 3998174.3 0.8796738 m:HL-f:FL -7389937.49 -16324792 1544916.9 0.2496858 ntrc:HL-f:FL -7122962.46 -16057817 1811891.9 0.3102106

I would like to work on this simple loop that is my example, because it seems easiest way. I will appreciate any help!

最满意答案

这是你想要的:

library(tidyverse) library(broom) read_csv(file = "https://filebin.net/w5cfuwztae7yk747/two.csv") %>% group_by(Accession) %>% do(broom::tidy(TukeyHSD(aov(value ~ genotype + Light + genotype:Light, data = .)))) %>% ungroup

输出:

# A tibble: 264 x 7 Accession term comparison estimate conf.low conf.high adj.p.value <chr> <fctr> <chr> <dbl> <dbl> <dbl> <dbl> 1 AT2G41680 genotype m-f -1586182.59 -3616647.7 444282.5 1.708496e-01 2 AT2G41680 genotype ntrc-f -5705550.95 -7694992.3 -3716109.6 2.609223e-08 3 AT2G41680 genotype WT-f -1568375.95 -3557817.3 421065.4 1.647950e-01 4 AT2G41680 genotype ntrc-m -4119368.37 -6149833.5 -2088903.3 2.214399e-05 5 AT2G41680 genotype WT-m 17806.64 -2012658.5 2048271.8 9.999951e-01 6 AT2G41680 genotype WT-ntrc 4137175.00 2147733.6 6126616.4 1.464605e-05 7 AT2G41680 Light HL-FL -3854435.85 -5849789.4 -1859082.3 4.872013e-05 8 AT2G41680 Light LL-FL 1528123.46 -467230.1 3523477.0 1.844033e-01 9 AT2G41680 Light ML-FL -2821752.94 -4775345.6 -868160.3 2.283331e-03 10 AT2G41680 Light LL-HL 5382559.31 3311883.1 7453235.6 2.176770e-07 # ... with 254 more rows

Is this what you are looking for:

library(tidyverse) library(broom) read_csv(file = "https://filebin.net/w5cfuwztae7yk747/two.csv") %>% group_by(Accession) %>% do(broom::tidy(TukeyHSD(aov(value ~ genotype + Light + genotype:Light, data = .)))) %>% ungroup

Output:

# A tibble: 264 x 7 Accession term comparison estimate conf.low conf.high adj.p.value <chr> <fctr> <chr> <dbl> <dbl> <dbl> <dbl> 1 AT2G41680 genotype m-f -1586182.59 -3616647.7 444282.5 1.708496e-01 2 AT2G41680 genotype ntrc-f -5705550.95 -7694992.3 -3716109.6 2.609223e-08 3 AT2G41680 genotype WT-f -1568375.95 -3557817.3 421065.4 1.647950e-01 4 AT2G41680 genotype ntrc-m -4119368.37 -6149833.5 -2088903.3 2.214399e-05 5 AT2G41680 genotype WT-m 17806.64 -2012658.5 2048271.8 9.999951e-01 6 AT2G41680 genotype WT-ntrc 4137175.00 2147733.6 6126616.4 1.464605e-05 7 AT2G41680 Light HL-FL -3854435.85 -5849789.4 -1859082.3 4.872013e-05 8 AT2G41680 Light LL-FL 1528123.46 -467230.1 3523477.0 1.844033e-01 9 AT2G41680 Light ML-FL -2821752.94 -4775345.6 -868160.3 2.283331e-03 10 AT2G41680 Light LL-HL 5382559.31 3311883.1 7453235.6 2.176770e-07 # ... with 254 more rows

更多推荐