Skip to content

Commit 31e3b5b

Browse files
committed
Docs (1)
1 parent a02672a commit 31e3b5b

5 files changed

Lines changed: 148 additions & 133 deletions

File tree

README.rst

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ CompStats
2727

2828
Collaborative competitions have gained popularity in the scientific and technological fields. These competitions involve defining tasks, selecting evaluation scores, and devising result verification methods. In the standard scenario, participants receive a training set and are expected to provide a solution for a held-out dataset kept by organizers. An essential challenge for organizers arises when comparing algorithms' performance, assessing multiple participants, and ranking them. Statistical tools are often used for this purpose; however, traditional statistical methods often fail to capture decisive differences between systems' performance. CompStats implements an evaluation methodology for statistically analyzing competition results and competition. CompStats offers several advantages, including off-the-shell comparisons with correction mechanisms and the inclusion of confidence intervals.
2929

30-
To illustrate the use of `CompStats`, the following snippets show an example. The instructions load the necessary libraries, including the one to obtain the problem (e.g., digits), three different classifiers, and the last line is the score used to measure the performance and compare the algorithm.
30+
To illustrate the use of `CompStats`, the following snippets show an example. The instructions load the necessary libraries, including the one to obtain the problem (e.g., digits), four different classifiers, and the last line is the score used to measure the performance and compare the algorithm.
3131

3232
>>> from sklearn.svm import LinearSVC
3333
>>> from sklearn.naive_bayes import GaussianNB
@@ -51,10 +51,10 @@ Once the predictions are available, it is time to measure the algorithm's perfor
5151
>>> score
5252
<Perf(score_func=f1_score, statistic=0.9435, se=0.0099)>
5353

54-
The previous code shows the macro-f1 score and, in parenthesis, its standard error. The actual performance value is stored in the `statistic` function.
54+
The previous code shows the macro-f1 score and its standard error. The actual performance value is stored in the attributes `statistic` function, and `se`
5555

56-
>>> score.statistic
57-
0.9434834454375508
56+
>>> score.statistic, score.se
57+
(0.9521479775366307, 0.009717884979482313)
5858

5959
Continuing with the example, let us assume that one wants to test another classifier on the same problem, in this case, a random forest, as can be seen in the following two lines. The second line predicts the validation set and sets it to the analysis.
6060

@@ -63,28 +63,34 @@ Continuing with the example, let us assume that one wants to test another classi
6363
<Perf(score_func=f1_score)>
6464
Statistic with its standard error (se)
6565
statistic (se)
66-
0.9655 (0.0077) <= Random Forest
67-
0.9435 (0.0099) <= alg-1
66+
0.9720 (0.0076) <= Random Forest
67+
0.9521 (0.0097) <= alg-1
6868

69-
Let us incorporate another prediction, now with the Naive Bayes classifier, as seen below.
69+
Let us incorporate another predictions, now with Naive Bayes classifier, and Histogram Gradient Boosting as seen below.
7070

7171
>>> nb = GaussianNB().fit(X_train, y_train)
7272
>>> score(nb.predict(X_val), name='Naive Bayes')
7373
<Perf(score_func=f1_score)>
7474
Statistic with its standard error (se)
7575
statistic (se)
76-
0.9655 (0.0077) <= Random Forest
77-
0.9435 (0.0099) <= alg-1
78-
0.8549 (0.0153) <= Naive Bayes
76+
0.9759 (0.0068) <= Hist. Grad. Boost. Tree
77+
0.9720 (0.0076) <= Random Forest
78+
0.9521 (0.0097) <= alg-1
79+
0.8266 (0.0159) <= Naive Bayes
7980

80-
The final step is to compare the performance of the three classifiers, which can be done with the `difference` method, as seen next.
81+
The performance, its confidence interval (5%), and a statistical comparison (5%) between the best performing system with the rest of the algorithms is depicted in the following figure.
82+
83+
>>> score.plot()
84+
85+
The final step is to compare the performance of the four classifiers, which can be done with the `difference` method, as seen next.
8186

8287
>>> diff = score.difference()
8388
>>> diff
8489
<Difference>
85-
difference p-values w.r.t Random Forest
90+
difference p-values w.r.t Hist. Grad. Boost. Tree
8691
0.0000 <= Naive Bayes
87-
0.0120 <= alg-1
92+
0.0100 <= alg-1
93+
0.3240 <= Random Forest
8894

8995
The class `Difference` has the `plot` method that can be used to depict the difference with respect to the best.
9096

docs/CompStats_metrics.ipynb

Lines changed: 103 additions & 101 deletions
Large diffs are not rendered by default.

docs/source/digits_difference.png

2.86 KB
Loading

docs/source/digits_perf.png

19.9 KB
Loading

docs/source/metrics_api.rst

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727

2828
:py:mod:`CompStats.metrics` aims to facilitate performance measurement (with standard errors and confidence intervals) and statistical comparisons between algorithms on a single problem, wrapping the different scores and loss functions found on :py:mod:`~sklearn.metrics`.
2929

30-
To illustrate the use of :py:mod:`CompStats.metrics`, the following snippets show an example. The instructions load the necessary libraries, including the one to obtain the problem (e.g., digits), three different classifiers, and the last line is the score used to measure the performance and compare the algorithm.
30+
To illustrate the use of :py:mod:`CompStats.metrics`, the following snippets show an example. The instructions load the necessary libraries, including the one to obtain the problem (e.g., digits), four different classifiers, and the last line is the score used to measure the performance and compare the algorithm.
3131

3232
>>> from sklearn.svm import LinearSVC
3333
>>> from sklearn.naive_bayes import GaussianNB
@@ -49,45 +49,52 @@ Once the predictions are available, it is time to measure the algorithm's perfor
4949

5050
>>> score = f1_score(y_val, hy, average='macro')
5151
>>> score
52-
<Perf>
53-
Statistic with its standard error (se)
54-
statistic (se)
55-
0.9332 (0.0113) <= alg-1
52+
<Perf(score_func=f1_score, statistic=0.9521, se=0.0097)>
5653

57-
The previous code shows the macro-f1 score and, in parenthesis, its standard error. The actual performance value is stored in the :py:func:`~CompStats.interface.Perf.statistic` function.
54+
The previous code shows the macro-f1 score and, in parenthesis, its standard error. The actual performance value is stored in the attributes :py:func:`~CompStats.interface.Perf.statistic` and :py:func:`~CompStats.interface.Perf.se`
5855

59-
>>> score.statistic
60-
{'alg-1': 0.9332035615949114}
56+
>>> score.statistic, score.se
57+
(0.9521479775366307, 0.009717884979482313)
6158

6259
Continuing with the example, let us assume that one wants to test another classifier on the same problem, in this case, a random forest, as can be seen in the following two lines. The second line predicts the validation set and sets it to the analysis.
6360

6461
>>> ens = RandomForestClassifier().fit(X_train, y_train)
6562
>>> score(ens.predict(X_val), name='Random Forest')
66-
<Perf>
63+
<Perf(score_func=f1_score)>
6764
Statistic with its standard error (se)
6865
statistic (se)
69-
0.9756 (0.0061) <= Random Forest
70-
0.9332 (0.0113) <= alg-1
66+
0.9720 (0.0076) <= Random Forest
67+
0.9521 (0.0097) <= alg-1
7168

72-
Let us incorporate another prediction, now with the Naive Bayes classifier, as seen below.
69+
Let us incorporate another predictions, now with Naive Bayes classifier, and Histogram Gradient Boosting as seen below.
7370

7471
>>> nb = GaussianNB().fit(X_train, y_train)
7572
>>> score(nb.predict(X_val), name='Naive Bayes')
76-
<Perf>
73+
>>> hist = HistGradientBoostingClassifier().fit(X_train, y_train)
74+
>>> score(hist.predict(X_val), name='Hist. Grad. Boost. Tree')
75+
<Perf(score_func=f1_score)>
7776
Statistic with its standard error (se)
7877
statistic (se)
79-
0.9756 (0.0061) <= Random Forest
80-
0.9332 (0.0113) <= alg-1
81-
0.8198 (0.0144) <= Naive Bayes
78+
0.9759 (0.0068) <= Hist. Grad. Boost. Tree
79+
0.9720 (0.0076) <= Random Forest
80+
0.9521 (0.0097) <= alg-1
81+
0.8266 (0.0159) <= Naive Bayes
82+
83+
The performance, its confidence interval (5%), and a statistical comparison (5%) between the best performing system with the rest of the algorithms is depicted in the following figure.
84+
85+
>>> score.plot()
86+
87+
.. image:: digits_perf.png
8288

83-
The final step is to compare the performance of the three classifiers, which can be done with the :py:func:`~CompStats.interface.Perf.difference` method, as seen next.
89+
The final step is to compare the performance of the four classifiers, which can be done with the :py:func:`~CompStats.interface.Perf.difference` method, as seen next.
8490

8591
>>> diff = score.difference()
8692
>>> diff
8793
<Difference>
88-
difference p-values w.r.t Random Forest
89-
0.0000 <= alg-1
94+
difference p-values w.r.t Hist. Grad. Boost. Tree
9095
0.0000 <= Naive Bayes
96+
0.0100 <= alg-1
97+
0.3240 <= Random Forest
9198

9299
The class :py:class:`~CompStats.Difference` has the :py:class:`~CompStats.Difference.plot` method that can be used to depict the difference with respectto the best.
93100

0 commit comments

Comments
 (0)