最有效的方法来删除python中的几个列表中的几个项目?(Most efficient way to remove several items in several lists in python?)

我有几个项目清单。 没有重复项目,每个项目每个项目最多显示一次(通常在所有项目中只有一次)。 我还有一个要从这个数据集中删除的项目列表。 如何以最清洁和最有效的方式完成?

我已经阅读了python,创建一个新对象通常比过滤一个新对象更简单快捷。 但我在基本测试中没有观察到:

data = [[i*j for j in range(1, 1000)] for i in range(1, 1000)] kill = [1456, 1368, 2200, 36, 850, 9585, 59588, 60325, 9520, 9592, 210, 3] # Method 1 : 0.1990 seconds for j in kill: for i in data: if j in i: i.remove(j) # Method 2 : 0.1920 seconds for i in data: for j in kill: if j in i: i.remove(j) # Method 3 : 0.2790 seconds data = [[j for j in i if j not in kill] for i in data]

哪种方法最适合在Python中使用?

I have several lists of items. There are no duplicates, each item appears at most once per list (and normally, only once in all lists). I also have a list of items to remove from this dataset. How can it be done in the cleanest and most efficient way?

I have read that in python, creating a new object is often simplier and faster than filtering an existant one. But I do not observe that in my basic tests :

data = [[i*j for j in range(1, 1000)] for i in range(1, 1000)] kill = [1456, 1368, 2200, 36, 850, 9585, 59588, 60325, 9520, 9592, 210, 3] # Method 1 : 0.1990 seconds for j in kill: for i in data: if j in i: i.remove(j) # Method 2 : 0.1920 seconds for i in data: for j in kill: if j in i: i.remove(j) # Method 3 : 0.2790 seconds data = [[j for j in i if j not in kill] for i in data]

Which method is the best to use in Python ?

最满意答案

https://wiki.python.org/moin/TimeComplexity

remove是O(n),因为它首先在列表中线性搜索,然后如果找到它,则删除的对象之后的每个元素都会在内存中向左移动一个位置。 由于这个remove是相当昂贵的操作。

因此,从长度列表中移除M项目N将出现O(N*M)

in列表中也是O(n)因为我们需要按顺序搜索整个列表。 因此用滤波器构建一个新的列表也是O(N*M) 。 然而,由于散列使得我们的过滤器O(N)成为O(1) O(N)

因此,最好的解决方案是(我只是为了简单而使用平面列表,而不是嵌套)

def remove_kill_from_data(data, kill): s = set(kill) return [i for i in data if i not in kill]

如果你不关心保持订单,这将更快(由于在C级完成,它仍然是O(N) )

def remove_kill_from_data_unordered(data, kill): s = set(kill) d = set(data) return d - s

应用于您的列表列表

kill_set = set(kill) [remove_kill_from_data(d, kill_set) for d in data]

一些定时(每个从静态data开始复制)

%timeit method1(data, kill) 210 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit method2(data, kill) 208 ms ± 2.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit method3(data, kill) 272 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit method4(data, kill) # using remove_kill_from_data 69.6 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit method5(data, kill) # using remove_kill_from_data_unordered 59.5 ms ± 3.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

https://wiki.python.org/moin/TimeComplexity

remove is O(n) because it first searches linearly through the list and then, if it finds it, every element after the removed object gets shifted one position to the left in memory. Because of this remove is quite an expensive operation.

Hence remove M items from a list of length N be comes O(N*M)

in on lists is also O(n) because we need to search through the whole list in order. Hence building a new list with a filter is also O(N*M). However, in on sets is O(1) due to hashing making our filter O(N)

Hence the best solution is (I'm just going to use a flat list for simplicity, not nested)

def remove_kill_from_data(data, kill): s = set(kill) return [i for i in data if i not in kill]

If you don't care about keeping the order, this would be even faster (due to being done at the C level, it's still O(N))

def remove_kill_from_data_unordered(data, kill): s = set(kill) d = set(data) return d - s

Applying to your list of lists

kill_set = set(kill) [remove_kill_from_data(d, kill_set) for d in data]

Some timings (each copies from a static data first)

%timeit method1(data, kill) 210 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit method2(data, kill) 208 ms ± 2.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit method3(data, kill) 272 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit method4(data, kill) # using remove_kill_from_data 69.6 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit method5(data, kill) # using remove_kill_from_data_unordered 59.5 ms ± 3.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

更多推荐