【統計計算の基礎】度数分布(Frequency distribution)と最頻値(Mode)の扱いについて。
実はこういう綺麗で見やすい度数分布票を表示するのも一苦労…
統計言語Rによる実現例
K <- 100
N <- 1000
pi.est <- c(NULL)
for (k in seq(1,K)) {
x <- runif(N, min=-1, max=1)
y <- runif(N, min=-1, max=1)
Data<-sum(sqrt(x*x+y*y))/N
pi.est<-c(Data,pi.est)
}
hist(pi.est, breaks=50)
rug(pi.est)
まずは円周率計算に用いた「ランダムにxlim=(-1,1),ylim=(-1,1)の範囲で打った点の中心(0,0)からの距離(x^2+y^20-1)を100個抽出したデータに目を向けて見ましょう。
統計言語Rによる実装例
pi.est
#再現性確保の為のデータ保存
TestData01<-c(0.7575710,0.7626254,0.7720857,0.7636143,0.7559374,0.7636892 ,0.7697000,0.7744773,0.7718450,0.7661308,0.7643701,0.7773141,0.7601370,0.7625076,0.7657443,0.7642925,0.7652136,0.7626871,0.7651091,0.7838558,0.7577962,0.7679088,0.7691799,0.7778908,0.7651222,0.7745353,0.7630110,0.7704386,0.7693323,0.7626189,0.7589299,0.7469987,0.7606626,0.7728721,0.7608035,0.7766716,0.7748260,0.7754264,0.7671104,0.7588898,0.7763099,0.7590662,0.7713618,0.7706218,0.7752811,0.7783880,0.7727876,0.7694160,0.7689929,0.7661827,0.7569213,0.7565509,0.7594678,0.7557077,0.7661841,0.7831111,0.7671762,0.7670367,0.7456229,0.7633547,0.7806836,0.7537715,0.7705932,0.7695142,0.7594232,0.7629710,0.7571859,0.7698099,0.7585800,0.7565154,0.7619804,0.7723939,0.7621805,0.7769550,0.7552302,0.7477231,0.7605412,0.7620618,0.7601785,0.7479306,0.7770422,0.7799119,0.7709682,0.7657947,0.7612600,0.7643751,0.7653982,0.7593435,0.7823729,0.7659631,0.7529565,0.7603398,0.7592395,0.7607167,0.7566061,0.7573243,0.7640064,0.7661753,0.7667829,0.7758546)#ヒストグラムとラグプロットの再表示
hist(TestData01, breaks=50)
rug(TestData01)
「度数分布表(Frequency distribution table)」表示への道
#そのまま度数分布表示しても酷い事にしかならない。
table(TestData01)
①「とりあえずround関数でまとめてみる」
TestData02<-round(TestData01,digits = 4)
table(TestData02)
TestData02<-round(TestData01,digits = 3)
table(TestData02)TestData02<-round(TestData01,digits = 2)
#見た目通り元データの桁情報が削られていくので上策とはいえない。
②ヒストグラム表の出力データを流用する。
h<-hist(TestData01)
h
$breaks
[1] 0.745 0.750 0.755 0.760 0.765 0.770 0.775 0.780 0.785$counts
[1] 4 2 19 24 23 13 11 4$density
[1] 8 4 38 48 46 26 22 8$mids
[1] 0.7475 0.7525 0.7575 0.7625 0.7675 0.7725 0.7775 0.7825$xname
[1] "TestData01"$equidist
[1] TRUEattr(,"class")
[1] "histogram"#breaksが階級を区切る値で、countsが度数。
h <- hist(TestData01, breaks=50)
n <- length(h$counts) # 階級の数
class_names <- NULL # 階級の名前格納用
for(i in 1:n) {
class_names[i] <- paste(h$breaks[i], "~", h$breaks[i+1])
}
frequency_table <- data.frame(class=class_names, frequency=h$counts)library(xtable)
print(xtable(frequency_table), type = "html")
class | frequency | |
---|---|---|
1 | 0.745 ~ 0.746 | 1 |
2 | 0.746 ~ 0.747 | 1 |
3 | 0.747 ~ 0.748 | 2 |
4 | 0.748 ~ 0.749 | 0 |
5 | 0.749 ~ 0.75 | 0 |
6 | 0.75 ~ 0.751 | 0 |
7 | 0.751 ~ 0.752 | 0 |
8 | 0.752 ~ 0.753 | 1 |
9 | 0.753 ~ 0.754 | 1 |
10 | 0.754 ~ 0.755 | 0 |
11 | 0.755 ~ 0.756 | 3 |
12 | 0.756 ~ 0.757 | 4 |
13 | 0.757 ~ 0.758 | 4 |
14 | 0.758 ~ 0.759 | 3 |
15 | 0.759 ~ 0.76 | 5 |
16 | 0.76 ~ 0.761 | 7 |
17 | 0.761 ~ 0.762 | 2 |
18 | 0.762 ~ 0.763 | 7 |
19 | 0.763 ~ 0.764 | 4 |
20 | 0.764 ~ 0.765 | 4 |
21 | 0.765 ~ 0.766 | 7 |
22 | 0.766 ~ 0.767 | 5 |
23 | 0.767 ~ 0.768 | 4 |
24 | 0.768 ~ 0.769 | 1 |
25 | 0.769 ~ 0.77 | 6 |
26 | 0.77 ~ 0.771 | 4 |
27 | 0.771 ~ 0.772 | 2 |
28 | 0.772 ~ 0.773 | 4 |
29 | 0.773 ~ 0.774 | 0 |
30 | 0.774 ~ 0.775 | 3 |
31 | 0.775 ~ 0.776 | 3 |
32 | 0.776 ~ 0.777 | 3 |
33 | 0.777 ~ 0.778 | 3 |
34 | 0.778 ~ 0.779 | 1 |
35 | 0.779 ~ 0.78 | 1 |
36 | 0.78 ~ 0.781 | 1 |
37 | 0.781 ~ 0.782 | 0 |
38 | 0.782 ~ 0.783 | 1 |
39 | 0.783 ~ 0.784 | 2 |
これでやっと最頻値(Mode)を扱う準備が整った訳です。