在我们利用xpath匹配页面标签时,经常会遇到标签下面还包含标签,但是我们只想取下面的所有文字

例如相匹配图中 div[@class='display-content']下面所有P的文字,此时我们可以利用这个方法

直接上代码

# 取正文def get_details(url):    payload = ""    headers = {        'Accept': "*/*",        'Accept-Encoding': "gzip, deflate",        'Accept-Language': "zh-CN,zh;q=0.9,en;q=0.8",        'Cache-Control': "no-cache",        'Connection': "keep-alive",        'Cookie': "SUV=1811281936496730; gidinf=x099980109ee0edb269b528280008252b495807e917b; _muid_=1548315571095387; IPLOC=CN4403; reqtype=pc; t=1557797597640; MTV_SRC=10010001",        'Host': "v2.sohu.com",        'Origin': "http://m.sohu.com",        'Pragma': "no-cache",        'Referer': "http://m.sohu.com/ch/8/?_f=m-index_important_hsdh&spm=smwp.home.nav-ch.1.1557825265945dy1ukUW",        'User-Agent': "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Mobile Safari/537.36",        'Postman-Token': "46314343-d211-4b4e-8d84-2b20462a5f54"    }    response = requests.request("GET", url, data=payload, headers=headers)    text = etree.HTML(response.text)    tt = text.xpath("//div[@class='display-content']")    # print(tt)    info = tt[0].xpath("string(.)")    return info

返回结果如图