运行代码,您将从该表中获取所需的数据。要尝试从该元素中提取数据,您需要做的就是将上面粘贴的整个html元素包装在html=''' '''
import csv
from bs4 import BeautifulSoup
outfile = open("table_data.csv","w",newline='')
writer = csv.writer(outfile)
tree = BeautifulSoup(html,"lxml")
table_tag = tree.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
for row_data in table_tag.select("tr")]
for data in tab_data:
writer.writerow(data)
print(' '.join(data))
我试图将代码分成几部分,以使您理解。我在上面所做的是一个嵌套的for循环。这是分开的过程:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,"lxml")
table = soup.find('table')
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll(["th","td"]):
text = cell.text
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
for item in list_of_rows:
print(' '.join(item))
结果:
Date Open High Low Close Volume Market Cap
Sep 14, 2017 3875.37 3920.60 3153.86 3154.95 2,716,310,000 64,191,600,000
Sep 13, 2017 4131.98 3789.92 3882.59 2,219,410,000 68,432,200,000
Sep 12, 2017 4168.88 4344.65 4085.22 4130.81 1,864,530,000 69,033,400,000