通过更仔细地查看C代码,我发现这种明显的矛盾是由于以下事实:ratio
对“替换”编辑操作的处理与对其他操作的处理(即,成本为2)不同,而distance
对它们的处理都相同。费用为1。
在levenshtein_common
函数内对内部函数 的调用中可以看出ratio_py
:
https://github.com/miohtama/python- Levenshtein/blob/master/Levenshtein.c#L727
static PyObject*
ratio_py(PyObject *self, PyObject *args)
{
size_t lensum;
long int ldist;
if ((ldist = levenshtein_common(args, "ratio", 1, &lensum)) < 0) //Call
return NULL;
if (lensum == 0)
return PyFloat_FromDouble(1.0);
return PyFloat_FromDouble((double)(lensum - ldist)/(lensum));
}
并按distance_py
功能:
https://github.com/miohtama/python- Levenshtein/blob/master/Levenshtein.c#L715
static PyObject*
distance_py(PyObject *self, PyObject *args)
{
size_t lensum;
long int ldist;
if ((ldist = levenshtein_common(args, "distance", 0, &lensum)) < 0)
return NULL;
return PyInt_FromLong((long)ldist);
}
最终导致将不同的cost参数发送到另一个内部函数lev_edit_distance
,该内部函数具有以下doc片段:
@xcost: If nonzero, the replace operation has weight 2, otherwise all
edit operations have equal weights of 1.
lev_edit_distance()的代码:
/**
* lev_edit_distance:
* @len1: The length of @string1.
* @string1: A sequence of bytes of length @len1, may contain NUL characters.
* @len2: The length of @string2.
* @string2: A sequence of bytes of length @len2, may contain NUL characters.
* @xcost: If nonzero, the replace operation has weight 2, otherwise all
* edit operations have equal weights of 1.
*
* Computes Levenshtein edit distance of two strings.
*
* Returns: The edit distance.
**/
_LEV_STATIC_PY size_t
lev_edit_distance(size_t len1, const lev_byte *string1,
size_t len2, const lev_byte *string2,
int xcost)
{
size_t i;
所以在我的例子中
ratio('ab', 'ac')
表示在字符串(4)的整个长度上进行替换操作(成本为2),因此2/4 = 0.5
。
这就解释了“如何”,我想剩下的唯一方面就是“为什么”,但目前我对这种理解感到满意。