最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

2023最新使用python編寫(xiě),request庫(kù)爬取當(dāng)當(dāng)網(wǎng)書(shū)籍信息100頁(yè)!

2023-07-15 21:30 作者:夕陽(yáng)是唯一  | 我要投稿

代碼如下,輸入關(guān)鍵詞即可運(yùn)行! import re import concurrent.futures import pandas as pd from time import sleep from bs4 import BeautifulSoup import requests def process_book(book): try: title = book.find("a", class_="pic").img.get("alt", "") price = float(book.find("span", class_="search_now_price").get_text(strip=True).replace("¥", "")) rating_text = book.find("a", class_="search_comment_num").get_text(strip=True) rating_count = int(re.search(r"\d+", rating_text).group()) if re.search(r"\d+", rating_text) else 0 author_info = book.find("p", class_="search_book_author").get_text(strip=True).split("/") author = author_info[0] if len(author_info) > 0 else '' publish_date = author_info[1] if len(author_info) > 1 else '' publisher = author_info[2].split("加")[0] if len(author_info) > 2 else '' return [title, price, rating_count, author, publish_date, publisher] except (AttributeError, ValueError, IndexError): return None def fetch_page(page, url_template, headers): url = url_template.format(page=page) response = requests.get(url, headers=headers) sleep(5) soup = BeautifulSoup(response.text, "html.parser") books = soup.find_all("li", class_=re.compile("line\d+")) return [process_book(book) for book in books if process_book(book) is not None] def fetch_data(): headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36" } url_template = "http://search.dangdang.com/?key=人工智能&page_index={page}" with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: pages_data = list(executor.map(fetch_page, range(1, 101), [url_template] * 101, [headers] * 101))#頁(yè)數(shù)[]內(nèi)為上限,()調(diào)整爬取頁(yè)數(shù) data = [item for sublist in pages_data for item in sublist] # flatten the list df = pd.DataFrame(data, columns=["書(shū)名", "價(jià)格", "評(píng)論數(shù)", "作者", "出版年份", "出版社"]) for _, row in df.iterrows(): print("書(shū)名:", row["書(shū)名"]) print("價(jià)格:", row["價(jià)格"]) print("評(píng)論數(shù):", row["評(píng)論數(shù)"]) print("作者:", row["作者"]) print("出版年份:", row["出版年份"]) print("出版社:", row["出版社"]) print("---------------------------------") df.to_csv("book_data.csv", index=False) fetch_data()

2023最新使用python編寫(xiě),request庫(kù)爬取當(dāng)當(dāng)網(wǎng)書(shū)籍信息100頁(yè)!的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
抚顺县| 华宁县| 北川| 玛多县| 康保县| 陵水| 三都| 南通市| 沧州市| 正宁县| 仁寿县| 汶上县| 长宁县| 宁都县| 冀州市| 泾阳县| 德昌县| 清流县| 大厂| 藁城市| 崇仁县| 博客| 内丘县| 朔州市| 朝阳区| 清水县| 日土县| 徐州市| 满洲里市| 长子县| 故城县| 阿荣旗| 宽甸| 新龙县| 化隆| 治多县| 桐梓县| 德庆县| 泸溪县| 白山市| 施甸县|