我会提出以下算法:
这段代码对我来说还行:
import pandas as pd
for sheet in range(3):
raw_data = pd.read_excel('blank_rows.xlsx', sheetname=sheet, header=None)
print(raw_data)
# looking for the header row
for i, row in raw_data.iterrows():
if row.notnull().all():
data = raw_data.iloc[(i+1):].reset_index(drop=True)
data.columns = list(raw_data.iloc[i])
break
# transforming columns to numeric where possible
for c in data.columns:
data[c] = pd.to_numeric(data[c], errors='ignore')
print(data)
根据您的示例,它将使用此玩具数据示例。从原始数据帧
0 1 2
0 Country Company Product
1 US ABC XYZ
2 US ABD XYY
0 1 2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 Country Company Product
4 US ABC XYZ
5 US ABD XYY
0 1 2
0 Product summary table for East region NaN NaN
1 Date: 1st Sep, 2016 NaN NaN
2 NaN NaN NaN
3 Country Company Product
4 US ABC XYZ
5 US ABD XYY
脚本产生相同的表
Country Company Product
0 US ABC XYZ
1 US ABD XYY