Beautifulsoup problem of scraping text in array2019 Community Moderator ElectionOptions for HTML scraping?Problem with scraping data using BeautifulSoupHeadless Browser and scraping - solutionsScrape with BeautifulSoup in a linePython and beautifulsoup - Scrape TextScraping (BeautifulSoup) no tagsPython - Issue Scraping with BeautifulSoupScraping with BeautifulSoup [Help]Scraping Instagram with BeautifulSoupHow to scrape javascript text using beautifulsoup

Professor forcing me to attend a conference, I can't afford even with 50% funding

Are small insurances worth it?

How to write a chaotic neutral protagonist and prevent my readers from thinking they are evil?

What is the generally accepted pronunciation of “topoi”?

What's the 'present simple' form of the word "нашла́" in 3rd person singular female?

Rationale to prefer local variables over instance variables?

What would be the most expensive material to an intergalactic society?

Giving a career talk in my old university, how prominently should I tell students my salary?

Do cubics always have one real root?

Can't make sense of a paragraph from Lovecraft

Trig Subsitution When There's No Square Root

From an axiomatic set theoric approach why can we take uncountable unions?

Doubts in understanding some concepts of potential energy

Is it safe to abruptly remove Arduino power?

How do electrons receive energy when a body is heated?

Which situations would cause a company to ground or recall a aircraft series?

Proving a statement about real numbers

Can one live in the U.S. and not use a credit card?

What is the population of Romulus in the TNG era?

Confusion about Complex Continued Fraction

How to resolve: Reviewer #1 says remove section X vs. Reviewer #2 says expand section X

I can't die. Who am I?

The meaning of ‘otherwise’

Why does cron require MTA for logging?



Beautifulsoup problem of scraping text in array



2019 Community Moderator ElectionOptions for HTML scraping?Problem with scraping data using BeautifulSoupHeadless Browser and scraping - solutionsScrape with BeautifulSoup in a linePython and beautifulsoup - Scrape TextScraping (BeautifulSoup) no tagsPython - Issue Scraping with BeautifulSoupScraping with BeautifulSoup [Help]Scraping Instagram with BeautifulSoupHow to scrape javascript text using beautifulsoup










0















Data=



<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>


Input=



 source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")


Output=



<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...


I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!










share|improve this question







New contributor




Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • just try with drawing_no.getText()

    – Pavan Kumar T S
    Mar 7 at 5:19











  • It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

    – Lucas K.C.L.
    Mar 7 at 5:23











  • it is duue to find_all just add another for loop after or if only one element replace find_all by find

    – Pavan Kumar T S
    Mar 7 at 5:25











  • edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'

    – Lucas K.C.L.
    Mar 7 at 5:27
















0















Data=



<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>


Input=



 source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")


Output=



<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...


I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!










share|improve this question







New contributor




Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • just try with drawing_no.getText()

    – Pavan Kumar T S
    Mar 7 at 5:19











  • It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

    – Lucas K.C.L.
    Mar 7 at 5:23











  • it is duue to find_all just add another for loop after or if only one element replace find_all by find

    – Pavan Kumar T S
    Mar 7 at 5:25











  • edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'

    – Lucas K.C.L.
    Mar 7 at 5:27














0












0








0








Data=



<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>


Input=



 source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")


Output=



<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...


I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!










share|improve this question







New contributor




Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












Data=



<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>


Input=



 source = driver.page_source
soup = BeautifulSoup(source, "lxml")
print(soup. prettify())
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find_all('td', class_='dojoxGridCell', idx='3')
# ->need one more line to extract text
print(""drawing_no")


Output=



<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">ROOF PLAN</td> ...


I just want to extract "ROOF PLAN" how should I edit my code?
I tried drawing_no.text and drawing_no.value but it said "no attribute".
Thanks for your help!







python-3.x web-scraping beautifulsoup






share|improve this question







New contributor




Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Mar 7 at 4:59









Lucas K.C.L.Lucas K.C.L.

34




34




New contributor




Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Lucas K.C.L. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • just try with drawing_no.getText()

    – Pavan Kumar T S
    Mar 7 at 5:19











  • It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

    – Lucas K.C.L.
    Mar 7 at 5:23











  • it is duue to find_all just add another for loop after or if only one element replace find_all by find

    – Pavan Kumar T S
    Mar 7 at 5:25











  • edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'

    – Lucas K.C.L.
    Mar 7 at 5:27


















  • just try with drawing_no.getText()

    – Pavan Kumar T S
    Mar 7 at 5:19











  • It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

    – Lucas K.C.L.
    Mar 7 at 5:23











  • it is duue to find_all just add another for loop after or if only one element replace find_all by find

    – Pavan Kumar T S
    Mar 7 at 5:25











  • edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'

    – Lucas K.C.L.
    Mar 7 at 5:27

















just try with drawing_no.getText()

– Pavan Kumar T S
Mar 7 at 5:19





just try with drawing_no.getText()

– Pavan Kumar T S
Mar 7 at 5:19













It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

– Lucas K.C.L.
Mar 7 at 5:23





It gives error: "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'getText'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

– Lucas K.C.L.
Mar 7 at 5:23













it is duue to find_all just add another for loop after or if only one element replace find_all by find

– Pavan Kumar T S
Mar 7 at 5:25





it is duue to find_all just add another for loop after or if only one element replace find_all by find

– Pavan Kumar T S
Mar 7 at 5:25













edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'

– Lucas K.C.L.
Mar 7 at 5:27






edited drawing_no = article.find('td', class_='dojoxGridCell', idx='3') . it gives:AttributeError: 'NoneType' object has no attribute 'getText'

– Lucas K.C.L.
Mar 7 at 5:27













3 Answers
3






active

oldest

votes


















0














please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.



import lxml
from lxml import html

html_string = """
<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
"""

tree = html.fromstring(html_string)
ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
print(''.join(ROOFPLAN).strip())





share|improve this answer























  • TypeError: expected string or bytes-like object

    – Lucas K.C.L.
    Mar 7 at 5:56











  • on which line? repl.it/repls/CylindricalMonumentalProperties

    – Y Y
    Mar 7 at 6:02











  • oh it works for some reason now!! Great!

    – Lucas K.C.L.
    Mar 7 at 6:07











  • upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space

    – Lucas K.C.L.
    Mar 7 at 6:09












  • it would be more clear if you can share the source code as I dont know from above where the text is stored

    – Y Y
    Mar 7 at 6:17



















0














try followig code



source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
<input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
<input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
<div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
<div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
<div role="presentation" style="position: absolute; left: 0px; top: 0px;">
<div aria-selected="false" class="dojoxGridRow" role="row" style="">
<table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
<tbody>
<tr>
<td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
78126
</td>
<td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
Approved Plan
</td>
<td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
G-10
</td>
<td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
ROOF PLAN
</td>
</tr>
</tbody>
</table>
</div>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(source,"html.parser")
for article in soup.find_all('div', class_='dojoxGridContent'):
drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
if drawing_no:
print(drawing_no.get_text())





share|improve this answer

























  • Error:AttributeError: 'NoneType' object has no attribute 'get_text'

    – Lucas K.C.L.
    Mar 7 at 5:38











  • possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists

    – Pavan Kumar T S
    Mar 7 at 5:47











  • same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

    – Lucas K.C.L.
    Mar 7 at 5:53











  • did you notice article.find change i replaced find_all

    – Pavan Kumar T S
    Mar 7 at 6:16











  • oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?

    – Lucas K.C.L.
    Mar 7 at 6:22


















0














You can use the idx attribute and select by its value



print(soup.select_one("[idx='3']").text.strip())





share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.









    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55036389%2fbeautifulsoup-problem-of-scraping-text-in-array%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.



    import lxml
    from lxml import html

    html_string = """
    <div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>
    </div>
    </div>
    </div>
    </div>
    """

    tree = html.fromstring(html_string)
    ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
    print(''.join(ROOFPLAN).strip())





    share|improve this answer























    • TypeError: expected string or bytes-like object

      – Lucas K.C.L.
      Mar 7 at 5:56











    • on which line? repl.it/repls/CylindricalMonumentalProperties

      – Y Y
      Mar 7 at 6:02











    • oh it works for some reason now!! Great!

      – Lucas K.C.L.
      Mar 7 at 6:07











    • upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space

      – Lucas K.C.L.
      Mar 7 at 6:09












    • it would be more clear if you can share the source code as I dont know from above where the text is stored

      – Y Y
      Mar 7 at 6:17
















    0














    please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.



    import lxml
    from lxml import html

    html_string = """
    <div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>
    </div>
    </div>
    </div>
    </div>
    """

    tree = html.fromstring(html_string)
    ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
    print(''.join(ROOFPLAN).strip())





    share|improve this answer























    • TypeError: expected string or bytes-like object

      – Lucas K.C.L.
      Mar 7 at 5:56











    • on which line? repl.it/repls/CylindricalMonumentalProperties

      – Y Y
      Mar 7 at 6:02











    • oh it works for some reason now!! Great!

      – Lucas K.C.L.
      Mar 7 at 6:07











    • upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space

      – Lucas K.C.L.
      Mar 7 at 6:09












    • it would be more clear if you can share the source code as I dont know from above where the text is stored

      – Y Y
      Mar 7 at 6:17














    0












    0








    0







    please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.



    import lxml
    from lxml import html

    html_string = """
    <div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>
    </div>
    </div>
    </div>
    </div>
    """

    tree = html.fromstring(html_string)
    ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
    print(''.join(ROOFPLAN).strip())





    share|improve this answer













    please try below code. But in general if you pass in idx=3 it will only return one single element. If you want to extract text from multiple element you might want to use a more general identifier.



    import lxml
    from lxml import html

    html_string = """
    <div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>
    </div>
    </div>
    </div>
    </div>
    """

    tree = html.fromstring(html_string)
    ROOFPLAN = tree.xpath('//tbody/tr//td[@idx="3"]/text()')
    print(''.join(ROOFPLAN).strip())






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Mar 7 at 5:27









    Y YY Y

    1717




    1717












    • TypeError: expected string or bytes-like object

      – Lucas K.C.L.
      Mar 7 at 5:56











    • on which line? repl.it/repls/CylindricalMonumentalProperties

      – Y Y
      Mar 7 at 6:02











    • oh it works for some reason now!! Great!

      – Lucas K.C.L.
      Mar 7 at 6:07











    • upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space

      – Lucas K.C.L.
      Mar 7 at 6:09












    • it would be more clear if you can share the source code as I dont know from above where the text is stored

      – Y Y
      Mar 7 at 6:17


















    • TypeError: expected string or bytes-like object

      – Lucas K.C.L.
      Mar 7 at 5:56











    • on which line? repl.it/repls/CylindricalMonumentalProperties

      – Y Y
      Mar 7 at 6:02











    • oh it works for some reason now!! Great!

      – Lucas K.C.L.
      Mar 7 at 6:07











    • upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space

      – Lucas K.C.L.
      Mar 7 at 6:09












    • it would be more clear if you can share the source code as I dont know from above where the text is stored

      – Y Y
      Mar 7 at 6:17

















    TypeError: expected string or bytes-like object

    – Lucas K.C.L.
    Mar 7 at 5:56





    TypeError: expected string or bytes-like object

    – Lucas K.C.L.
    Mar 7 at 5:56













    on which line? repl.it/repls/CylindricalMonumentalProperties

    – Y Y
    Mar 7 at 6:02





    on which line? repl.it/repls/CylindricalMonumentalProperties

    – Y Y
    Mar 7 at 6:02













    oh it works for some reason now!! Great!

    – Lucas K.C.L.
    Mar 7 at 6:07





    oh it works for some reason now!! Great!

    – Lucas K.C.L.
    Mar 7 at 6:07













    upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space

    – Lucas K.C.L.
    Mar 7 at 6:09






    upvoted, by the way unlike bs4 it has no ",' between string, can i export it to csv or pandas? it gives something like ROOF PLANLOWER GROUND FL. PLANGROUND FL. PLAN1ST FL. PLAN2ND FL. PLAN3RD FL. PLAN TO 14TH FL. PLAN & 16TH FL. PLAN TO 18TH FL. PLAN15TH FIRE RELIEF FL.19TH FL. PLAN20TH FL. PLAN TO 25TH FL. PLANCALCULATIONSHADOW AREA CALCULATIONSECTION A - ASECTION B - BNORTH-WEST ELEVATIONNORTH-EAST ELEVATIONSOUTH-EAST ,without space

    – Lucas K.C.L.
    Mar 7 at 6:09














    it would be more clear if you can share the source code as I dont know from above where the text is stored

    – Y Y
    Mar 7 at 6:17






    it would be more clear if you can share the source code as I dont know from above where the text is stored

    – Y Y
    Mar 7 at 6:17














    0














    try followig code



    source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>"""
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(source,"html.parser")
    for article in soup.find_all('div', class_='dojoxGridContent'):
    drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
    if drawing_no:
    print(drawing_no.get_text())





    share|improve this answer

























    • Error:AttributeError: 'NoneType' object has no attribute 'get_text'

      – Lucas K.C.L.
      Mar 7 at 5:38











    • possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists

      – Pavan Kumar T S
      Mar 7 at 5:47











    • same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

      – Lucas K.C.L.
      Mar 7 at 5:53











    • did you notice article.find change i replaced find_all

      – Pavan Kumar T S
      Mar 7 at 6:16











    • oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?

      – Lucas K.C.L.
      Mar 7 at 6:22















    0














    try followig code



    source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>"""
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(source,"html.parser")
    for article in soup.find_all('div', class_='dojoxGridContent'):
    drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
    if drawing_no:
    print(drawing_no.get_text())





    share|improve this answer

























    • Error:AttributeError: 'NoneType' object has no attribute 'get_text'

      – Lucas K.C.L.
      Mar 7 at 5:38











    • possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists

      – Pavan Kumar T S
      Mar 7 at 5:47











    • same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

      – Lucas K.C.L.
      Mar 7 at 5:53











    • did you notice article.find change i replaced find_all

      – Pavan Kumar T S
      Mar 7 at 6:16











    • oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?

      – Lucas K.C.L.
      Mar 7 at 6:22













    0












    0








    0







    try followig code



    source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>"""
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(source,"html.parser")
    for article in soup.find_all('div', class_='dojoxGridContent'):
    drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
    if drawing_no:
    print(drawing_no.get_text())





    share|improve this answer















    try followig code



    source="""<div class="dojoxGridView" id="dojox_grid__View_1" role="presentation" style="width: 1900px; height: 721px; left: 1px; top: 0px;" widgetid="dojox_grid__View_1">
    <input class="dojoxGridHiddenFocus" dojoattachpoint="hiddenFocusNode" role="presentation" type="checkbox"/>
    <input class="dojoxGridHiddenFocus" role="presentation" type="checkbox"/>
    <div class="dojoxGridScrollbox" dojoattachpoint="scrollboxNode" role="presentation" style="height: 721px;">
    <div class="dojoxGridContent" dojoattachpoint="contentNode" hidefocus="hidefocus" role="presentation" style="height: 504px; width: 1900px;">
    <div role="presentation" style="position: absolute; left: 0px; top: 0px;">
    <div aria-selected="false" class="dojoxGridRow" role="row" style="">
    <table border="0" cellpadding="0" cellspacing="0" class="dojoxGridRowTable" role="presentation" style="width: 1900px;">
    <tbody>
    <tr>
    <td class="dojoxGridCell" idx="0" role="gridcell" style="display:none;width:100px;" tabindex="-1">
    78126
    </td>
    <td class="dojoxGridCell" idx="1" role="gridcell" style="width:10%;" tabindex="-1">
    Approved Plan
    </td>
    <td class="dojoxGridCell" idx="2" role="gridcell" style="width:10%;" tabindex="-1">
    G-10
    </td>
    <td class="dojoxGridCell" idx="3" role="gridcell" style="width:40%;" tabindex="-1">
    ROOF PLAN
    </td>
    </tr>
    </tbody>
    </table>
    </div>"""
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(source,"html.parser")
    for article in soup.find_all('div', class_='dojoxGridContent'):
    drawing_no = article.find('td', class_='dojoxGridCell', idx='3')
    if drawing_no:
    print(drawing_no.get_text())






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 7 at 5:45

























    answered Mar 7 at 5:28









    Pavan Kumar T SPavan Kumar T S

    625419




    625419












    • Error:AttributeError: 'NoneType' object has no attribute 'get_text'

      – Lucas K.C.L.
      Mar 7 at 5:38











    • possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists

      – Pavan Kumar T S
      Mar 7 at 5:47











    • same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

      – Lucas K.C.L.
      Mar 7 at 5:53











    • did you notice article.find change i replaced find_all

      – Pavan Kumar T S
      Mar 7 at 6:16











    • oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?

      – Lucas K.C.L.
      Mar 7 at 6:22

















    • Error:AttributeError: 'NoneType' object has no attribute 'get_text'

      – Lucas K.C.L.
      Mar 7 at 5:38











    • possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists

      – Pavan Kumar T S
      Mar 7 at 5:47











    • same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

      – Lucas K.C.L.
      Mar 7 at 5:53











    • did you notice article.find change i replaced find_all

      – Pavan Kumar T S
      Mar 7 at 6:16











    • oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?

      – Lucas K.C.L.
      Mar 7 at 6:22
















    Error:AttributeError: 'NoneType' object has no attribute 'get_text'

    – Lucas K.C.L.
    Mar 7 at 5:38





    Error:AttributeError: 'NoneType' object has no attribute 'get_text'

    – Lucas K.C.L.
    Mar 7 at 5:38













    possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists

    – Pavan Kumar T S
    Mar 7 at 5:47





    possible if certian if certain articles dont have table element with matching td check updated ans itonly prints if element exists

    – Pavan Kumar T S
    Mar 7 at 5:47













    same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

    – Lucas K.C.L.
    Mar 7 at 5:53





    same error :( "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'get_text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

    – Lucas K.C.L.
    Mar 7 at 5:53













    did you notice article.find change i replaced find_all

    – Pavan Kumar T S
    Mar 7 at 6:16





    did you notice article.find change i replaced find_all

    – Pavan Kumar T S
    Mar 7 at 6:16













    oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?

    – Lucas K.C.L.
    Mar 7 at 6:22





    oh no, i did not, it works! gives: ROOF PLAN but how do i get all element though?

    – Lucas K.C.L.
    Mar 7 at 6:22











    0














    You can use the idx attribute and select by its value



    print(soup.select_one("[idx='3']").text.strip())





    share|improve this answer



























      0














      You can use the idx attribute and select by its value



      print(soup.select_one("[idx='3']").text.strip())





      share|improve this answer

























        0












        0








        0







        You can use the idx attribute and select by its value



        print(soup.select_one("[idx='3']").text.strip())





        share|improve this answer













        You can use the idx attribute and select by its value



        print(soup.select_one("[idx='3']").text.strip())






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 7 at 6:24









        QHarrQHarr

        34.9k82044




        34.9k82044




















            Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.









            draft saved

            draft discarded


















            Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.












            Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.











            Lucas K.C.L. is a new contributor. Be nice, and check out our Code of Conduct.














            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55036389%2fbeautifulsoup-problem-of-scraping-text-in-array%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to get text form Clipboard with JavaScript in Firefox 56?How to validate an email address in JavaScript?How do JavaScript closures work?How do I remove a property from a JavaScript object?How do you get a timestamp in JavaScript?How do I copy to the clipboard in JavaScript?How do I include a JavaScript file in another JavaScript file?Get the current URL with JavaScript?How to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I remove a particular element from an array in JavaScript?

            Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

            List of MPs elected to the English parliament in 1640 (April) Contents List of constituencies and members See also Notes References Navigation menueNational Archives – The Glynde Place ArchivesCobbett's Parliamentary history of England, from the Norman Conquest in 1066 to the year 1803'Aldermen in Parliament', The Aldermen of the City of London: Temp. Henry III – 1912onepage&q&f&#61, false 229