您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

Python语言描述KNN算法与Kd树

5b51 2022/1/14 8:15:50 python 字数 7103 阅读 282 来源 www.jb51.cc/python

最近邻法和k-近邻法 下面图片中只有三种豆,有三个豆是未知的种类,如何判定他们的种类?

概述

最近邻法和k-近邻法

下面图片中只有三种豆,有三个豆是未知的种类,如何判定他们的种类?

提供一种思路,即:未知的豆离哪种豆最近就认为未知豆和该豆是同一种类。由此,我们引出最近邻算法的定义:为了判定未知样本的类别,以全部训练样本作为代表点,计算未知样本与所有训练样本的距离,并以最近邻者的类别作为决策未知样本类别的唯一依据。但是,最近邻算法明显是存在缺陷的,比如下面的例子:有一个未知形状(图中绿色的圆点),如何判断它是什么形状?

显然,最近邻算法的缺陷――对噪声数据过于敏感,为了解决这个问题,我们可以可以把未知样本周边的多个最近样本计算在内,扩大参与决策的样本量,以避免个别数据直接决定决策结果。由此,我们引进K-最近邻算法。K-最近邻算法是最近邻算法的一个延伸。基本思路是:选择未知样本一定范围内确定个数的K个样本,该K个样本大多数属于某一类型,则未知样本判定为该类型。如何选择一个最佳的K值取决于数据。一般情况下,在分类时较大的K值能够减小噪声的影响,但会使类别之间的界限变得模糊。待测样本(绿色圆圈)既可能分到红色三角形类,也可能分到蓝色正方形类。如果k取3,从图可见,待测样本的3个邻居在实线的内圆里,按多数投票结果,它属于红色三角形类。但是如果k取5,那么待测样本的最邻近的5个样本在虚线的圆里,按表决法,它又属于蓝色正方形类。在实际应用中,K先取一个比较小的数值,再采用交叉验证法来逐步调整K值,最终选择适合该样本的最优的K值。

KNN算法实现 
算法基本步骤:

1)计算待分类点与已知类别的点之间的距离

2)按照距离递增次序排序

3)选取与待分类点距离最小的k个点

4)确定前k个点所在类别的出现次数

5)返回前k个点出现次数最高的类别作为待分类点的预测分类

  下面是一个按照算法基本步骤用python实现的简单例子,根据已分类的4个样本点来预测未知点(图中的灰点)的分类

from numpy import * 

# create a dataset which contains 4 samples with 2 classes 
def createDataSet(): 
  # create a matrix: each row as a sample 
  group = array([[1.0,0.9],[1.0,1.0],[0.1,0.2],[0.0,0.1]]) 
  labels = ['A','A','B','B'] # four samples and two classes 
  return group,labels
# classify using kNN (k Nearest Neighbors ) 
# Input:   newInput: 1 x N
#       dataSet: M x N (M samples N,features)
#       labels:  1 x M  
#       k: number of neighbors to use for comparison 
# Output:   the most popular class label  
def kNNClassify(newInput,dataSet,labels,k): 
  numSamples = dataSet.shape[0] # shape[0] stands for the num of row 
  ## step 1: calculate Euclidean distance 
  # tile(A,reps): Construct an array by repeating A reps times 
  # the following copy numSamples rows for dataSet 
  diff = tile(newInput,(numSamples,1)) - dataSet # Subtract element-wise 
  squaredDiff = diff ** 2 # squared for the subtract 
  squaredDist = sum(squaredDiff,axis = 1) # sum is performed by row 
  distance = squaredDist ** 0.5 
  ## step 2: sort the distance 
  # argsort() returns the indices that would sort an array in a ascending order 
  sortedDistIndices = argsort(distance) 
  classCount = {} # define a dictionary (can be append element) 
  for i in xrange(k): 
    ## step 3: choose the min k distance 
    VoteLabel = labels[sortedDistIndices[i]] 
    ## step 4: count the times labels occur 
    # when the key VoteLabel is not in dictionary classCount,get() 
    # will return 0 
    classCount[VoteLabel] = classCount.get(VoteLabel,0) + 1 
  ## step 5: the max Voted class will return 
  maxCount = 0 
  for key,value in classCount.items(): 
    if value > maxCount: 
      maxCount = value 
      maxIndex = key 
  return maxIndex  
  if __name__== "__main__":  
  dataSet,labels = createDataSet() 
  testX = array([1.2,1.0]) 
  k = 3 
  outputLabel = kNNClassify(testX,3) 
  print "Your input is:",testX,"and classified to class: ",outputLabel 
  testX = array([0.1,0.3]) 
  outputLabel = kNNClassify(testX,outputLabel

结果如下:
Your input is: [ 1.2 1. ] and classified to class: A
Your input is: [ 0.1 0.3] and classified to class: B

  OpenCV中也提供了机器学习的相关算法,其中KNN算法的最基本例子如下

import numpy as np
import matplotlib.pyplot as plt
import cv2
# Feature set containing (x,y) values of 25 kNown/training data
trainData = np.random.randint(0,100,(25,2)).astype(np.float32)
# Labels each one either Red or Blue with numbers 0 and 1
responses = np.random.randint(0,2,1)).astype(np.float32)
# Take Red families and plot them
red = trainData[responses.ravel()==0]
plt.scatter(red[:,0],red[:,1],80,'r','^')
# Take Blue families and plot them
blue = trainData[responses.ravel()==1]
plt.scatter(blue[:,blue[:,'b','s')
# Testing data
newcomer = np.random.randint(0,(1,2)).astype(np.float32)
plt.scatter(newcomer[:,newcomer[:,'g','o')
knn = cv2.KNearest()
knn.train(trainData,responses) # Trains the model
# Finds the neighbors and predicts responses for input vectors.
ret,results,neighbours,dist = knn.find_nearest(newcomer,3)
print "result: ","\n"
print "neighbours: ","\n"
print "distance: ",dist
plt.show()


如果您也喜欢它,动动您的小指点个赞吧

除非注明,文章均由 laddyq.com 整理发布,欢迎转载。

转载请注明:
链接:http://laddyq.com
来源:laddyq.com
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。


联系我
置顶