您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

如何计算python-Levenshtein.ratio

如何计算python-Levenshtein.ratio

通过更仔细地查看C代码,我发现这种明显的矛盾是由于以下事实:ratio对“替换”编辑操作的处理与对其他操作的处理(即,成本为2)不同,而distance对它们的处理都相同。费用为1。

levenshtein_common函数内对内部函数调用中可以看出ratio_py

https://github.com/miohtama/python- Levenshtein/blob/master/Levenshtein.c#L727

static PyObject*
ratio_py(PyObject *self, PyObject *args)
{
  size_t lensum;
  long int ldist;

  if ((ldist = levenshtein_common(args, "ratio", 1, &lensum)) < 0) //Call
    return NULL;

  if (lensum == 0)
    return PyFloat_FromDouble(1.0);

  return PyFloat_FromDouble((double)(lensum - ldist)/(lensum));
}

并按distance_py功能

https://github.com/miohtama/python- Levenshtein/blob/master/Levenshtein.c#L715

static PyObject*
distance_py(PyObject *self, PyObject *args)
{
  size_t lensum;
  long int ldist;

  if ((ldist = levenshtein_common(args, "distance", 0, &lensum)) < 0)
    return NULL;

  return PyInt_FromLong((long)ldist);
}

最终导致将不同的cost参数发送到另一个内部函数lev_edit_distance,该内部函数具有以下doc片段:

@xcost: If nonzero, the replace operation has weight 2, otherwise all
        edit operations have equal weights of 1.

lev_edit_distance()的代码

/**
 * lev_edit_distance:
 * @len1: The length of @string1.
 * @string1: A sequence of bytes of length @len1, may contain NUL characters.
 * @len2: The length of @string2.
 * @string2: A sequence of bytes of length @len2, may contain NUL characters.
 * @xcost: If nonzero, the replace operation has weight 2, otherwise all
 *         edit operations have equal weights of 1.
 *
 * Computes Levenshtein edit distance of two strings.
 *
 * Returns: The edit distance.
 **/
_LEV_STATIC_PY size_t
lev_edit_distance(size_t len1, const lev_byte *string1,
                  size_t len2, const lev_byte *string2,
                  int xcost)
{
  size_t i;

所以在我的例子中

ratio('ab', 'ac')表示在字符串(4)的整个长度上进行替换操作(成本为2),因此2/4 = 0.5

这就解释了“如何”,我想剩下的唯一方面就是“为什么”,但目前我对这种理解感到满意。

python 2022/1/1 18:26:54 有190人围观

撰写回答


你尚未登录,登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进,让解决方法与时俱进

请先登录

推荐问题


联系我
置顶