Beautifulsoup problem of scraping text in array2019 Community Moderator ElectionOptions for HTML scraping?Problem with scraping data using BeautifulSoupHeadless Browser and scraping - solutionsScrape with BeautifulSoup in a linePython and beautifulsoup - Scrape TextScraping (BeautifulSoup) no tagsPython - Issue Scraping with BeautifulSoupScraping with BeautifulSoup [Help]Scraping Instagram with BeautifulSoupHow to scrape javascript text using beautifulsoup
Professor forcing me to attend a conference, I can't afford even with 50% funding
Are small insurances worth it?
How to write a chaotic neutral protagonist and prevent my readers from thinking they are evil?
What is the generally accepted pronunciation of “topoi”?
What's the 'present simple' form of the word "нашла́" in 3rd person singular female?
Rationale to prefer local variables over instance variables?
What would be the most expensive material to an intergalactic society?
Giving a career talk in my old university, how prominently should I tell students my salary?
Do cubics always have one real root?
Can't make sense of a paragraph from Lovecraft
Trig Subsitution When There's No Square Root
From an axiomatic set theoric approach why can we take uncountable unions?
Doubts in understanding some concepts of potential energy
Is it safe to abruptly remove Arduino power?
How do electrons receive energy when a body is heated?
Which situations would cause a company to ground or recall a aircraft series?
Proving a statement about real numbers
Can one live in the U.S. and not use a credit card?
What is the population of Romulus in the TNG era?
Confusion about Complex Continued Fraction
How to resolve: Reviewer #1 says remove section X vs. Reviewer #2 says expand section X
I can't die. Who am I?
The meaning of ‘otherwise’
Why does cron require MTA for logging?
Beautifulsoup problem of scraping text in array
2019 Community Moderator ElectionOptions for HTML scraping?Problem with scraping data using BeautifulSoupHeadless Browser and scraping - solutionsScrape with BeautifulSoup in a linePython and beautifulsoup - Scrape TextScraping (BeautifulSoup) no tagsPython - Issue Scraping with BeautifulSoupScraping with BeautifulSoup [Help]Scraping Instagram with BeautifulSoupHow to scrape javascript text using beautifulsoup
Data=
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
Input=
source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")
Output=
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...
I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!
python-3.x web-scraping beautifulsoup
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
Data=
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
Input=
source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")
Output=
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...
I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!
python-3.x web-scraping beautifulsoup
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
just try with drawing_no.getText()
– Pavan Kumar T S
Mar 7 at 5:19
It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:23
it is duue to find_all just add another for loop after or if only one element replace find_all by find
– Pavan Kumar T S
Mar 7 at 5:25
edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'
– Lucas K.C.L.
Mar 7 at 5:27
add a comment |
Data=
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
Input=
source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")
Output=
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...
I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!
python-3.x web-scraping beautifulsoup
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Data=
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
Input=
source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")
Output=
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...
I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!
python-3.x web-scraping beautifulsoup
python-3.x web-scraping beautifulsoup
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Mar 7 at 4:59
Lucas K.C.L.Lucas K.C.L.
34
34
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
just try with drawing_no.getText()
– Pavan Kumar T S
Mar 7 at 5:19
It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:23
it is duue to find_all just add another for loop after or if only one element replace find_all by find
– Pavan Kumar T S
Mar 7 at 5:25
edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'
– Lucas K.C.L.
Mar 7 at 5:27
add a comment |
just try with drawing_no.getText()
– Pavan Kumar T S
Mar 7 at 5:19
It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:23
it is duue to find_all just add another for loop after or if only one element replace find_all by find
– Pavan Kumar T S
Mar 7 at 5:25
edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'
– Lucas K.C.L.
Mar 7 at 5:27
just try with drawing_no.getText()
– Pavan Kumar T S
Mar 7 at 5:19
just try with drawing_no.getText()
– Pavan Kumar T S
Mar 7 at 5:19
It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:23
It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:23
it is duue to find_all just add another for loop after or if only one element replace find_all by find
– Pavan Kumar T S
Mar 7 at 5:25
it is duue to find_all just add another for loop after or if only one element replace find_all by find
– Pavan Kumar T S
Mar 7 at 5:25
edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'
– Lucas K.C.L.
Mar 7 at 5:27
edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'
– Lucas K.C.L.
Mar 7 at 5:27
add a comment |
3 Answers
3
active
oldest
votes
please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.
import lxml
from lxml import html
html_string = """
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
"""
tree = html.fromstring(html_string)
ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
print(''.join(ROOFPLAN).strip())
TypeError: expected string or bytes-like object
– Lucas K.C.L.
Mar 7 at 5:56
on which line? repl.it/repls/CylindricalMonumentalProperties
– Y Y
Mar 7 at 6:02
oh it works for some reason now!! Great!
– Lucas K.C.L.
Mar 7 at 6:07
upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space
– Lucas K.C.L.
Mar 7 at 6:09
it would be more clear if you can share the source code as I dont know from above where the text is stored
– Y Y
Mar 7 at 6:17
add a comment |
try followig code
source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source,"html.parser")
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
if drawing_no:
print(drawing_no.get_text())
Error:AttributeError: 'NoneType' object has no attribute 'get_text'
– Lucas K.C.L.
Mar 7 at 5:38
possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists
– Pavan Kumar T S
Mar 7 at 5:47
same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:53
did you notice article.find change i replaced find_all
– Pavan Kumar T S
Mar 7 at 6:16
oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?
– Lucas K.C.L.
Mar 7 at 6:22
|
show 2 more comments
You can use the idx attribute and select by its value
print(soup.select_one("[idx='3']").text.strip())
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55036389%2fbeautifulsoup-problem-of-scraping-text-in-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.
import lxml
from lxml import html
html_string = """
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
"""
tree = html.fromstring(html_string)
ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
print(''.join(ROOFPLAN).strip())
TypeError: expected string or bytes-like object
– Lucas K.C.L.
Mar 7 at 5:56
on which line? repl.it/repls/CylindricalMonumentalProperties
– Y Y
Mar 7 at 6:02
oh it works for some reason now!! Great!
– Lucas K.C.L.
Mar 7 at 6:07
upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space
– Lucas K.C.L.
Mar 7 at 6:09
it would be more clear if you can share the source code as I dont know from above where the text is stored
– Y Y
Mar 7 at 6:17
add a comment |
please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.
import lxml
from lxml import html
html_string = """
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
"""
tree = html.fromstring(html_string)
ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
print(''.join(ROOFPLAN).strip())
TypeError: expected string or bytes-like object
– Lucas K.C.L.
Mar 7 at 5:56
on which line? repl.it/repls/CylindricalMonumentalProperties
– Y Y
Mar 7 at 6:02
oh it works for some reason now!! Great!
– Lucas K.C.L.
Mar 7 at 6:07
upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space
– Lucas K.C.L.
Mar 7 at 6:09
it would be more clear if you can share the source code as I dont know from above where the text is stored
– Y Y
Mar 7 at 6:17
add a comment |
please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.
import lxml
from lxml import html
html_string = """
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
"""
tree = html.fromstring(html_string)
ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
print(''.join(ROOFPLAN).strip())
please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.
import lxml
from lxml import html
html_string = """
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
"""
tree = html.fromstring(html_string)
ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
print(''.join(ROOFPLAN).strip())
answered Mar 7 at 5:27
Y YY Y
1717
1717
TypeError: expected string or bytes-like object
– Lucas K.C.L.
Mar 7 at 5:56
on which line? repl.it/repls/CylindricalMonumentalProperties
– Y Y
Mar 7 at 6:02
oh it works for some reason now!! Great!
– Lucas K.C.L.
Mar 7 at 6:07
upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space
– Lucas K.C.L.
Mar 7 at 6:09
it would be more clear if you can share the source code as I dont know from above where the text is stored
– Y Y
Mar 7 at 6:17
add a comment |
TypeError: expected string or bytes-like object
– Lucas K.C.L.
Mar 7 at 5:56
on which line? repl.it/repls/CylindricalMonumentalProperties
– Y Y
Mar 7 at 6:02
oh it works for some reason now!! Great!
– Lucas K.C.L.
Mar 7 at 6:07
upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space
– Lucas K.C.L.
Mar 7 at 6:09
it would be more clear if you can share the source code as I dont know from above where the text is stored
– Y Y
Mar 7 at 6:17
TypeError: expected string or bytes-like object
– Lucas K.C.L.
Mar 7 at 5:56
TypeError: expected string or bytes-like object
– Lucas K.C.L.
Mar 7 at 5:56
on which line? repl.it/repls/CylindricalMonumentalProperties
– Y Y
Mar 7 at 6:02
on which line? repl.it/repls/CylindricalMonumentalProperties
– Y Y
Mar 7 at 6:02
oh it works for some reason now!! Great!
– Lucas K.C.L.
Mar 7 at 6:07
oh it works for some reason now!! Great!
– Lucas K.C.L.
Mar 7 at 6:07
upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space
– Lucas K.C.L.
Mar 7 at 6:09
upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space
– Lucas K.C.L.
Mar 7 at 6:09
it would be more clear if you can share the source code as I dont know from above where the text is stored
– Y Y
Mar 7 at 6:17
it would be more clear if you can share the source code as I dont know from above where the text is stored
– Y Y
Mar 7 at 6:17
add a comment |
try followig code
source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source,"html.parser")
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
if drawing_no:
print(drawing_no.get_text())
Error:AttributeError: 'NoneType' object has no attribute 'get_text'
– Lucas K.C.L.
Mar 7 at 5:38
possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists
– Pavan Kumar T S
Mar 7 at 5:47
same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:53
did you notice article.find change i replaced find_all
– Pavan Kumar T S
Mar 7 at 6:16
oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?
– Lucas K.C.L.
Mar 7 at 6:22
|
show 2 more comments
try followig code
source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source,"html.parser")
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
if drawing_no:
print(drawing_no.get_text())
Error:AttributeError: 'NoneType' object has no attribute 'get_text'
– Lucas K.C.L.
Mar 7 at 5:38
possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists
– Pavan Kumar T S
Mar 7 at 5:47
same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:53
did you notice article.find change i replaced find_all
– Pavan Kumar T S
Mar 7 at 6:16
oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?
– Lucas K.C.L.
Mar 7 at 6:22
|
show 2 more comments
try followig code
source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source,"html.parser")
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
if drawing_no:
print(drawing_no.get_text())
try followig code
source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source,"html.parser")
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
if drawing_no:
print(drawing_no.get_text())
edited Mar 7 at 5:45
answered Mar 7 at 5:28
Pavan Kumar T SPavan Kumar T S
625419
625419
Error:AttributeError: 'NoneType' object has no attribute 'get_text'
– Lucas K.C.L.
Mar 7 at 5:38
possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists
– Pavan Kumar T S
Mar 7 at 5:47
same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:53
did you notice article.find change i replaced find_all
– Pavan Kumar T S
Mar 7 at 6:16
oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?
– Lucas K.C.L.
Mar 7 at 6:22
|
show 2 more comments
Error:AttributeError: 'NoneType' object has no attribute 'get_text'
– Lucas K.C.L.
Mar 7 at 5:38
possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists
– Pavan Kumar T S
Mar 7 at 5:47
same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:53
did you notice article.find change i replaced find_all
– Pavan Kumar T S
Mar 7 at 6:16
oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?
– Lucas K.C.L.
Mar 7 at 6:22
Error:AttributeError: 'NoneType' object has no attribute 'get_text'
– Lucas K.C.L.
Mar 7 at 5:38
Error:AttributeError: 'NoneType' object has no attribute 'get_text'
– Lucas K.C.L.
Mar 7 at 5:38
possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists
– Pavan Kumar T S
Mar 7 at 5:47
possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists
– Pavan Kumar T S
Mar 7 at 5:47
same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:53
same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:53
did you notice article.find change i replaced find_all
– Pavan Kumar T S
Mar 7 at 6:16
did you notice article.find change i replaced find_all
– Pavan Kumar T S
Mar 7 at 6:16
oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?
– Lucas K.C.L.
Mar 7 at 6:22
oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?
– Lucas K.C.L.
Mar 7 at 6:22
|
show 2 more comments
You can use the idx attribute and select by its value
print(soup.select_one("[idx='3']").text.strip())
add a comment |
You can use the idx attribute and select by its value
print(soup.select_one("[idx='3']").text.strip())
add a comment |
You can use the idx attribute and select by its value
print(soup.select_one("[idx='3']").text.strip())
You can use the idx attribute and select by its value
print(soup.select_one("[idx='3']").text.strip())
answered Mar 7 at 6:24
QHarrQHarr
34.9k82044
34.9k82044
add a comment |
add a comment |
Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.
Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.
Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.
Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55036389%2fbeautifulsoup-problem-of-scraping-text-in-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
just try with drawing_no.getText()
– Pavan Kumar T S
Mar 7 at 5:19
It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
– Lucas K.C.L.
Mar 7 at 5:23
it is duue to find_all just add another for loop after or if only one element replace find_all by find
– Pavan Kumar T S
Mar 7 at 5:25
edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'
– Lucas K.C.L.
Mar 7 at 5:27