直接在Scipy稀疏矩阵上使用Intel MKL库来计算内存较少的A点AT

查看scipy稀疏产品的Python代码。注意，它在2次调用中调用了已编译的代码。

看起来mkl代码做同样的事情

https://software.intel.com/zh-CN/node/468640

如果request = 1，则例程仅计算长度为m + 1的数组ic的值，该数组的内存必须事先分配。退出时，值ic（m + 1）-1是数组c和jc中元素的实际数量。

如果request = 2，则例程先前已使用参数request = 1进行了调用，输出数组jc和c在调用程序中分配，并且它们的长度至少为ic（m + 1）-1。

首先ic根据行的数量进行分配C（从输入中得知），然后使用调用mkl代码request=1。

因为request=2您必须根据中的大小分配c和jc数组ic(m+1) - 1。这nnz与输入数组中的数目不同。

您正在使用request1 = c_int(0)，这要求c数组的大小正确-在没有实际执行dot（或a request 1）的情况下是未知的。

==================

File:        /usr/lib/python3/dist-packages/scipy/sparse/compressed.py
DeFinition:  A._mul_sparse_matrix(self, other)

传递1分配indptr（音符大小），并传递指针（此传递中的数据无关紧要）

    indptr = np.empty(major_axis + 1, dtype=idx_dtype)

    fn = getattr(_sparsetools, self.format + '_matmat_pass1')
    fn(M, N,
       np.asarray(self.indptr, dtype=idx_dtype),
       np.asarray(self.indices, dtype=idx_dtype),
       np.asarray(other.indptr, dtype=idx_dtype),
       np.asarray(other.indices, dtype=idx_dtype),
       indptr)

    nnz = indptr[-1]

传递2分配indptr（不同大小），并基于nnz indices和data。

    indptr = np.asarray(indptr, dtype=idx_dtype)
    indices = np.empty(nnz, dtype=idx_dtype)
    data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

    fn = getattr(_sparsetools, self.format + '_matmat_pass2')
    fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
       np.asarray(self.indices, dtype=idx_dtype),
       self.data,
       np.asarray(other.indptr, dtype=idx_dtype),
       np.asarray(other.indices, dtype=idx_dtype),
       other.data,
       indptr, indices, data)

最后使用这些数组制作一个新数组。

    return self.__class__((data,indices,indptr),shape=(M,N))

该mkl库应以相同的方式使用。

===================

https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h

有C代码用于csr_matmat_pass1和csr_matmat_pass2

====================

如果有帮助，这里是这些传递的纯Python实现。文字转换，无需尝试利用任何数组操作。

def pass1(A, B):
    nrow,ncol=A.shape
    Aptr=A.indptr
    Bptr=B.indptr
    Cp=np.zeros(nrow+1,int)
    mask=np.zeros(ncol,int)-1
    nnz=0
    for i in range(nrow):
        row_nnz=0
        for jj in range(Aptr[i],Aptr[i+1]):
            j=A.indices[jj]
            for kk in range(Bptr[j],Bptr[j+1]):
                k=B.indices[kk]
                if mask[k]!=i:
                    mask[k]=i
                    row_nnz += 1
        nnz += row_nnz
        Cp[i+1]=nnz
    return Cp

def pass2(A,B,Cnnz):
    nrow,ncol=A.shape
    Ap,Aj,Ax=A.indptr,A.indices,A.data
    Bp,Bj,Bx=B.indptr,B.indices,B.data

    next = np.zeros(ncol,int)-1
    sums = np.zeros(ncol,A.dtype)

    Cp=np.zeros(nrow+1,int)
    Cj=np.zeros(Cnnz,int)
    Cx=np.zeros(Cnnz,A.dtype)
    nnz = 0
    for i in range(nrow):
        head = -2
        length = 0
        for jj in range(Ap[i], Ap[i+1]):
            j, v = Aj[jj], Ax[jj]
            for kk in range(Bp[j], Bp[j+1]):
                k = Bj[kk]
                sums[k] += v*Bx[kk]
                if (next[k]==-1):
                    next[k], head=head, k
                    length += 1
        print(i,sums, next)
        for _ in range(length):
            if sums[head] !=0:
                Cj[nnz], Cx[nnz] = head, sums[head]
                nnz += 1
            temp = head
            head = next[head]
            next[temp], sums[temp] = -1, 0
        Cp[i+1] = nnz            
    return Cp, Cj, Cx

def pass0(A,B):
    Cp = pass1(A,B)
    nnz = Cp[-1]
    Cp,Cj,Cx=pass2(A,B,nnz)
    N,M=A.shape[0], B.shape[1]
    C=sparse.csr_matrix((Cx, Cj, Cp), shape=(N,M))
    return C

无论A和B必须csr。它使用需要转换的转置，例如B = A.T.tocsr()。

其他 2022/1/1 18:31:55 有552人围观

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进，让解决方法与时俱进

请先登录

直接在Scipy稀疏矩阵上使用Intel MKL库来计算内存较少的A点AT

撰写回答

推荐问题

在SQL结果中（在mysql或perl端）填充空日期的最直接方法是什么？

防止直接访问由ajax函数调用的文件

JavaScript使用原型相对于直接在构造函数中定义方法的优势？

Jenkins触发的代码部署在ApplicationStop步骤失败，即使通过代码部署直接进行的相同部署组已成功运行

是否可以通过带有“伪造”请求的代码直接调用Express Router？

向页面添加复杂元素时的事件委托与直接绑定

直接提供HTML页面时，如何应用servlet过滤器？

直接调用分配给对象属性的闭包

我可以将div用作UL的直接子级吗？

LESS中的直接子选择器

使用CSV存储引擎直接从CSV文件创建mysql表？

实际上，为什么我不能直接修改组件的状态？

Flyway或Liquibase可以生成更新脚本而不是直接更新数据库吗？

为什么JSON直接编码UTF-16代理对而不是Unicode代码点？

如何直接在Windows桌面C＃上绘制？

直接从Django中的urls.py重定向到命名的url模式？

没有JSX的React.js-“警告：某些东西正在直接调用React组件。改用工厂或JSX”

直接将require（'chromedriver）.path传递给selenium-webdriver

我可以将div用作UL的直接子代吗？

将组件直接添加到JFrame，还是将它们放在JPanel中？

分类汇总

您的鼓励是对我最大的支持