最近遇到一個問題:在遍歷列表時刪除重複內容,不夠徹底;再解決後,來分享下我的思路
實際情景
下圖是某張表的記錄【僅考慮這些字段】,直接看來,就感覺大部分都是重複的,全部去重後也就三條;但在腳本,執行我寫的方法後 ,結果還是很多條,我就有些犯迷糊了,沒頭緒。
實際我用的代碼如下:
for i in abc:
if abc.count(i) != 1:
abc.remove(i)
爲了不泄露公司的數據,用列表abc來分享:
def test_012(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
for i in abc:
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '處理後的長度')
這一部分是我第一版的代碼,感覺沒毛病呢。執行結果卻是:
但肉眼看着就覺得不對;若是全部去重,肯定是4條啊,爲什麼會是7條呢。
解決方法
上面的思路是:對列表abc做個遍歷,若當前元素在列表中出現的次數不等於1,就移除當前元素;但這次移除某元素後的 列表(長度、元素)實際變了。【遍歷在新的列表操作】看下圖:
那要怎樣才能實現呢?
思路1a:拿出一個不會變、元素相同的列表 來代替列表abc,因爲列表 真正的拷貝是要使用分片的方法,故而
def test_234(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
for i in abc[:]:
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '處理後的長度')
print(abc)
倒序的列表 來代替列表abc:
def test_345(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
for i in abc[::-1]:
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '處理後的長度')
print(abc)
思路1a:拿出一個元素相同的元組 來代替列表abc
def test_45678(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
for i in tuple(abc):
if abc.count(i) != 1:
abc.remove(i)
print(len(abc), '處理後的長度')
print(abc)
思路2:如果我非要堅持使用會變的列表abc作遍歷呢,我想到的是 讓其一直在做遍歷,直到某次遍歷前後 列表abc的長度不做改變,就跳出循環;
def test_123(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
while True:
length1 = len(abc)
for i in abc:
if abc.count(i) != 1:
abc.remove(i)
length2 = len(abc)
if length1 == length2:
break
print(len(abc), '處理後的長度')
print(abc)
思路3:把列表abc中count爲1的元素內容扔進新list
def test_012(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
ABC = list()
for i in abc[:]:
if abc.count(i) != 1:
abc.remove(i)
else:
ABC.append(i)
print(len(ABC), '處理後的長度')
print(ABC)
def test_7892(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
temp = abc[:]
abc.clear()
for e in temp:
if e not in abc:
abc.append(e)
print(len(abc), '處理後的長度')
print(abc, '新的')
上面的執行結果:
思路4:如果不特別關注列表中元素的前後順序,使用set()函數
def test_789(self):
abc = [('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('115', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
('66666', 'C2019093015002', 1569826868000, 1569826868000),
66666, 66666, 66666, 66666,
7878, 7878]
print(len(abc), '最初的長度')
set_abc = set(abc)
print(len(set_abc), '處理後的長度')
print(list(set_abc), '新的')
這一部分執行的結果就有可能是:
交流技術 歡迎+QQ 153132336 zy
個人博客 https://blog.csdn.net/zyooooxie