如果您担心实际速度,则一定要使用numpy(因为聪明的算法调整可能比使用数组操作获得的收益要小得多)。这是三个都计算相同功能的解决方案。该is_pareto_efficient_dumb
解决方案在大多数情况下较慢,但随着成本增加而变得更快,在许多点上,该is_pareto_efficient_simple
解决方案都比哑解决方案有效得多,并且最终is_pareto_efficient
函数可读性较差,但最快(所以所有这些都是帕累托高效的!)。
import numpy as np
# Very slow for many datapoints. Fastest for many costs, most readable
def is_pareto_efficient_dumb(costs):
"""
Find the pareto-efficient points
:param costs: An (n_points, n_costs) array
:return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
"""
is_efficient = np.ones(costs.shape[0], dtype = bool)
for i, c in enumerate(costs):
is_efficient[i] = np.all(np.any(costs[:i]>c, axis=1)) and np.all(np.any(costs[i+1:]>c, axis=1))
return is_efficient
# Fairly fast for many datapoints, less fast for many costs, somewhat readable
def is_pareto_efficient_simple(costs):
"""
Find the pareto-efficient points
:param costs: An (n_points, n_costs) array
:return: A (n_points, ) boolean array, indicating whether each point is Pareto efficient
"""
is_efficient = np.ones(costs.shape[0], dtype = bool)
for i, c in enumerate(costs):
if is_efficient[i]:
is_efficient[is_efficient] = np.any(costs[is_efficient]<c, axis=1) # Keep any point with a lower cost
is_efficient[i] = True # And keep self
return is_efficient
# Faster than is_pareto_efficient_simple, but less readable.
def is_pareto_efficient(costs, return_mask = True):
"""
Find the pareto-efficient points
:param costs: An (n_points, n_costs) array
:param return_mask: True to return a mask
:return: An array of indices of pareto-efficient points.
If return_mask is True, this will be an (n_points, ) boolean array
Otherwise it will be a (n_efficient_points, ) integer array of indices.
"""
is_efficient = np.arange(costs.shape[0])
n_points = costs.shape[0]
next_point_index = 0 # Next index in the is_efficient array to search for
while next_point_index<len(costs):
nondominated_point_mask = np.any(costs<costs[next_point_index], axis=1)
nondominated_point_mask[next_point_index] = True
is_efficient = is_efficient[nondominated_point_mask] # Remove dominated points
costs = costs[nondominated_point_mask]
next_point_index = np.sum(nondominated_point_mask[:next_point_index])+1
if return_mask:
is_efficient_mask = np.zeros(n_points, dtype = bool)
is_efficient_mask[is_efficient] = True
return is_efficient_mask
else:
return is_efficient
分析测试(使用从正态分布中得出的点):
含10,000个样本,有2个成本:
is_pareto_efficient_dumb: Elapsed time is 1.586s
is_pareto_efficient_simple: Elapsed time is 0.009653s
is_pareto_efficient: Elapsed time is 0.005479s
拥有1,000,000个样本,有2个成本:
is_pareto_efficient_dumb: Really, really, slow
is_pareto_efficient_simple: Elapsed time is 1.174s
is_pareto_efficient: Elapsed time is 0.4033s
使用10,000个样本,需要15个费用:
is_pareto_efficient_dumb: Elapsed time is 4.019s
is_pareto_efficient_simple: Elapsed time is 6.466s
is_pareto_efficient: Elapsed time is 6.41s
请注意,如果您担心效率问题,可以通过预先对数据重新排序来进一步提高2倍左右的速度,请参见此处。