You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
k-nearest neighbor algorithm with p-adic distance functions
The k-nearest neighbor (k-NN) algorithm is a widely used method for classification and prediction. Its performance is strongly influenced by the chosen distance (or similarity) measure. By Ostrowski’s theorem, any non-trivial absolute value on $\mathbb{Q}$ is, up to equivalence, either the usual absolute value or a $p$-adic absolute value for some prime $p$. In this project, $p$-adic absolute values are coupled with k-NN to define alternative distance functions on $\mathbb{Q}^n$.
If you want to try p-adic calculations in Python, please import the p_adic.py script first.
Related works
This work is part of the published papers in Neurocomputing and Journal of Classification.
Kartal, E., Çalışkan, F., Eskişehirli, B. B., & Özen, Z. (in press). p-adic Distance Functions and Comparative Performance Analysis of Distance Functions with the k-NN Classifier. Journal of Classification.
Funding
This work was supported by the Scientific and Technological Research Council of Türkiye (TÜBİTAK) (Grant No. 123E293).
The $p$-adic metric on $\mathbb{Q}$
Let $p$ be a prime. For a nonzero integer $\alpha$, the $p$-adic valuation $\mathrm{ord}_p(\alpha)$ is the unique nonnegative integer such that
where $x,y\in\mathbb{Q}$. Using coordinate-wise differences in $\mathbb{Q}^n$, three $p$-adic distance functions are defined:
$p$-adic Manhattan distance
$p$-adic Euclidean distance
$p$-adic Chebyshev distance
The terminology is adopted because distances defined via the $p$-adic absolute value provide natural $p$-adic counterparts to the classical distances induced by the usual absolute value.
Let $x=(x_1,\dots,x_n),\quad y=(y_1,\dots,y_n)\in\mathbb{Q}^n$.
Earlier text/code in this repository, a Manhattan-based $p$-adic distance function for the $k$-NN algorithm was introduced in Kartal et al. (2024), although it was not assigned a specific name. For clarity and consistency, it is referred to here as the $p$-adic Manhattan distance ($d_{\mathrm{pManh}}$) (Kartal et. al, in press).
Example 1 ($p=5$)
$$
d_5(3,125)=|3-125|_5=|-122|_5=|122|_5.
$$
Since $5\nmid 122$ and $\mathrm{ord}_5(122)=0$, $|122|_5=5^{0}=1$. Therefore,
Decimal rounding (dec) for numerical and mixed datasets
When $p$-adic distances are computed on numerical or mixed datasets, input values are often rounded before $p$-adic computations. The rounding precision is denoted by dec. Since $|x|_p=p^{-\mathrm{ord}_p(x)}$, a small rounding change may alter $\mathrm{ord}_p(\cdot)$ and therefore change $|\cdot|_p$ and the resulting distances.
Demonstration of the effect of dec (using the $p=3$ example)
Case B (dec = 1):$0.117\mapsto 0.1=\frac{1}{10}$, hence $\left|0.1\right|_3=1$. The rounded difference vector becomes $\left(\frac{1}{10},27,-27,7\right)$. Then
The pseudocode for the k-nearest neighbor algorithm with $p$-adic distance functions
Let the labeled dataset be
$$
X=\left((x_i,y_i)\right)_{i=1}^{n},
$$
where $x_i\in\mathbb{Q}^m$ and $y_i$ is the class label. For a query sample $z$, the predicted label is determined by the majority vote among the $k$ nearest neighbors:
where $P\subseteq X$ contains the $k$ nearest neighbors of $z$ under the selected $p$-adic distance (e.g., $d_{\mathrm{pManh}}$, $d_{\mathrm{pEuc}}$, or $d_{\mathrm{pCheb}}$).
In Python, these distances can be integrated into sklearn.neighbors.KNeighborsClassifier via a user-defined metric function. Datasets can be loaded with the datasets module sklearn.datasets, and $p$-adic computations can be performed after importing p_adic.py.
About
k-Nearest Neighbors Algorithm with p-adic Distance