黑松山资源网 Design By www.paidiu.com
本文介绍了python爬虫之BeautifulSoup 使用select方法详解 ,分享给大家。具体如下:
<html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """
我们在写 CSS 时,标签名不加任何修饰,类名前加点,id名前加 #,在这里我们也可以利用类似的方法来筛选元素,用到的方法是 soup.select(),返回类型是 list
(1)通过标签名查找
print soup.select('title') #[<title>The Dormouse's story</title>] print soup.select('a') #[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>] print soup.select('b') #[<b>The Dormouse's story</b>]
(2)通过类名查找
print soup.select('.sister') #[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link3">Tillie</a>]
(3)通过 id 名查找
print soup.select('#link1') #[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
(4)组合查找
组合查找即和写 class 文件时,标签名与类名、id名进行的组合原理是一样的,例如查找 p 标签中,id 等于 link1的内容,二者需要用空格分开
print soup.select('p #link1') #[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
直接子标签查找
print soup.select("head > title") #[<title>The Dormouse's story</title>]
(5)属性查找
查找时还可以加入属性元素,属性需要用中括号括起来,注意属性和标签属于同一节点,所以中间不能加空格,否则会无法匹配到。
print soup.select("head > title") #[<title>The Dormouse's story</title>] print soup.select('a[href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]') #[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
同样,属性仍然可以与上述查找方式组合,不在同一节点的空格隔开,同一节点的不加空格
print soup.select('p a[href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" ]') #[<a class="sister" href="http://example.com/elsie" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" id="link1"><!-- Elsie --></a>]
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。
黑松山资源网 Design By www.paidiu.com
广告合作:本站广告合作请联系QQ:858582 申请时备注:广告合作(否则不回)
免责声明:本站资源来自互联网收集,仅供用于学习和交流,请遵循相关法律法规,本站一切资源不代表本站立场,如有侵权、后门、不妥请联系本站删除!
免责声明:本站资源来自互联网收集,仅供用于学习和交流,请遵循相关法律法规,本站一切资源不代表本站立场,如有侵权、后门、不妥请联系本站删除!
黑松山资源网 Design By www.paidiu.com
暂无评论...
更新日志
2024年10月09日
2024年10月09日
- 【原神】V5.0攻略 | 林尼攻略一图流
- 李翊君.1993-相思的烈酒【上华】【WAV+CUE】
- 古巨基.1998-LEO.KU(国)【千禧年代】【WAV+CUE】
- 郭子.2001-原来你什么都不想要创作集丫滚石】【WAV+CUE】
- 《使命召唤:黑色行动6》新预告公布!10月25日发售
- Atlus《暗喻幻想》媒体评分汇总:高分好评如潮!
- 2024金摇杆奖提名揭晓 《黑神话》角逐最佳视觉设计!
- 群星《新说唱2024 第3期 (上)》[320K/MP3][32.76MB]
- 群星《新说唱2024 第3期 (上)》[FLAC/分轨][95.38MB]
- 群星《新说唱2024 第3期 (下)》[320K/MP3][31.36MB]
- 幻兽帕鲁手游什么时候正式上线 最新消息一览
- 西普大陆BOSS位置盘点 解锁天启纪元玩法
- 西普大陆精灵进阶培养攻略 精灵养成指南
- dnf手游法控法系职业哪个强 dnf手游法控法系职业强度排行
- 魔兽世界血藤护目镜图纸在哪买 wlk血藤护目镜图纸购买位置介绍