Question for KNN Algorithm

asheesh kumar singhal
2 min readSep 1, 2020

Hi Al, Here are some of the common and general questions and queries related to K nearest neighbors,

  1. What is K in KNN?
    It is the number of nearest data points (xi) which the algorithms looks for around the query point xq.
  2. What should be the value of K?
    K should be odd. If K will be even then there is a possibility that half of the data points are -ve and another half are +ve.In such case it will be difficult to know what xq is? K should not be multiple of output categories or classes
  3. Where could KNN fail?
    a) if the data is spread randomly and there is no clusters formed
    b) if the query point if far from the input data points in the training dataset.
  4. What is relation between euclidean distance and cosine distance ?
    ||A-B||2 = 2( 1- cos(A,B) )
  5. What are two steps involved in KNN??
    Compute K nearest neighbors and then do majority vote i.e calculate Majority hood
  6. How to determine what is the best K for KNN?
    WE check for over-fitting and under-fitting. Typically initially, as the K increases accuracy also increases, It reaches the maximum , and then accuracy starts to fall as K is increased further. So we choose the value of K where the accuracy is maximum. Its better to use K-fold cross validation to have better K.
  7. What is training complexity of KNN?
    O(1). At a very simple and theoretically level its O(1). However if KD-Tree or any other algo is used inside KNN to spatially index the data to the query faster then this complexity will change.
  8. What is testing complexity of KNN?
    O(N). If a naive approach is chosen where we look at each data point to calculate the distance and then check for the K nearest points. However, we generally use indexed algorithms, e.g. Kd-Tree for faster query.
  9. What is disadvantage of K-Fold Cross Validation?
    The training time increase by K times since it will calculate results K times for K train and validation sets.
  10. What is Advantage of K-Fold Cross Validation?
    It allows us to utilize complete learning and known dataset( train -dataset and Validation Dataset). If we do not use K-fold Validation then we will utilize only a portion of the dataset i.e. only D-train. However if we use K fold then we will also look see validation data as training data in one of the sets of K-fold and hence increase the learning dataset size

--

--