在BeautifulSoup缺少节点的findAll()(findAll() in BeautifulSoup missing nodes)

BeautifulSoup中的方法findAll()不返回XML中的所有元素。 如果您查看下面的代码并打开URL,您可以看到XML中有10个PubmedArticle节点。 但是,findAll方法只能找到其中的6个。 输出只有6 *而不是10.我做错了什么?

import urllib2 from bs4 import BeautifulSoup URL = 'http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=23858559,23858558,23858557,23858521,23858508,23858506,23858494,23858473,23858461,23858404' data = urllib2.urlopen(URL).read() soup = BeautifulSoup(data) for x in soup.findAll('pubmedarticle'): print '*'

The method findAll() in BeautifulSoup does not return all elements in XML. If you look the code below and open URL, you can see that there are 10 PubmedArticle nodes in XML. However the findAll method only finds 6 of them. There is only 6 * on the output instead of 10. What am I doing wrong?

import urllib2 from bs4 import BeautifulSoup URL = 'http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=23858559,23858558,23858557,23858521,23858508,23858506,23858494,23858473,23858461,23858404' data = urllib2.urlopen(URL).read() soup = BeautifulSoup(data) for x in soup.findAll('pubmedarticle'): print '*'

最满意答案

编辑:我发现'findAll'是相对于当前节点的,你可以用汤设置根节点。

提供的xml中的实体名为“PubMedArticle”,因此请尝试以下操作:

for x in soup.pubmedarticleset.findAll('pubmedarticle'): print '*'

I solved this by adding xml argument. Make sure you have lxml installed.

soup = BeautifulSoup(xmlData, 'xml')

更多推荐