How to make BeautifulSoup 'replace_with' attribute work with a 'unicode' object?2019 Community Moderator ElectionHow do I ignore tags while getting the .string of a Beautiful Soup element?How to sort a list of objects based on an attribute of the objects?How can I make a time delay in Python?How to know if an object has an attribute in PythonHow to make a chain of function decorators?How to make a flat list out of list of lists?In Python, how do I determine if an object is iterable?How does Python's super() work with multiple inheritance?How to make a class JSON serializableHow does the @property decorator work?How to make IPython notebook matplotlib plot inline
Can inspiration allow the Rogue to make a Sneak Attack?
Is this nominative case or accusative case?
Does the in-code argument passing conventions used on PDP-11's have a name?
Is every open circuit a capacitor?
Naming Characters after Friends/Family
An Undercover Army
Can a Mimic (container form) actually hold loot?
Learning to quickly identify valid fingering for piano?
School performs periodic password audits. Is my password compromised?
What does it mean when I add a new variable to my linear model and the R^2 stays the same?
Too soon for a plot twist?
How can I be pwned if I'm not registered on the compromised site?
ESPP--any reason not to go all in?
Is there a math equivalent to the conditional ternary operator?
I can't die. Who am I?
Integrating function with /; in its definition
Do natural melee weapons (from racial traits) trigger Improved Divine Smite?
What is the oldest European royal house?
The (Easy) Road to Code
Is it a Cyclops number? "Nobody" knows!
Giving a talk in my old university, how prominently should I tell students my salary?
Called into a meeting and told we are being made redundant (laid off) and "not to share outside". Can I tell my partner?
Deal the cards to the players
Why do we call complex numbers “numbers” but we don’t consider 2 vectors numbers?
How to make BeautifulSoup 'replace_with' attribute work with a 'unicode' object?
2019 Community Moderator ElectionHow do I ignore tags while getting the .string of a Beautiful Soup element?How to sort a list of objects based on an attribute of the objects?How can I make a time delay in Python?How to know if an object has an attribute in PythonHow to make a chain of function decorators?How to make a flat list out of list of lists?In Python, how do I determine if an object is iterable?How does Python's super() work with multiple inheritance?How to make a class JSON serializableHow does the @property decorator work?How to make IPython notebook matplotlib plot inline
Here is my html
:
<html>
<body>
<h2>Pizza</h2>
<p>This is some random paragraph without child tags.</p>
<p>Delicious homebaked pizza.<br><em></em>$8.99 pp</em></p>
<h2>Eggplant Parmesan</h2>
<p>Try the authentic <i>Italian flavor</i> of baked aubergine.<br><em>$6.99 pp</em></p>
<h2>Italian Ice Cream</h2>
<p>Our dessert specialty.<br><em>$3.99 pp</em></p>
</body>
</html>
Using BeautifulSoup, I want to grab the text that is displayed for the h2
and p
tags, replace them with a prefixed version in the tree, and also print them out on screen. For the h2
tags, this works fine:
from bs4 import BeautifulSoup
with open("/var/www/html/Test/index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("/var/www/html/Test/I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
f.write(str(soup))
###Output:##############################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
########################################################
In my I18N_index.html, all 3 strings appear correctly prefixed with 'I18N_'.
However, my p
tags contain child tags, and for these the return type is 'None'. As a result, the concatenation no longer works:
for p in soup.find_all('p'):
i18n_string = "I18N_"+p.string
p.string.replace_with(i18n_string)
print(p.string)
f.write(str(soup))
###Output:##################################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# Traceback (most recent call last):
# File "./test.py", line 15, in <module>
# i18n_string = "I18N_"+p.string
# TypeError: cannot concatenate 'str' and 'NoneType' objects
############################################################
From this thread I learned about the join
function. It let's me do the concatenation and print out the resulting strings on screen, but not the replacement in the soup tree:
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
#joined.replace_with(i18n_string)
print (i18n_string)
###Output with 'joined.replace_with(i18n_string)' DISABLED:###
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# I18N_Delicious homebaked pizza.$8.99 pp
# I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
# I18N_Our dessert specialty$3.99 pp
############################################################
###Output with 'joined.replace_with(i18n_string)' ENABLED:#####
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# Traceback (most recent call last):
# File "./test.py", line 41, in <module>
# joined.replace_with(i18n_string)
# AttributeError: 'unicode' object has no attribute 'replace_with'
############################################################
In that thread, another solution based on isinstance
is mentioned, but I could not make that work.
If I understand correctly, the join function joins the strings but returns a 'unicode' object, not a string object, and this is why the 'replace_with' attribute doesn't work. How can I work around this? Any help is much appreciated.
python beautifulsoup
New contributor
add a comment |
Here is my html
:
<html>
<body>
<h2>Pizza</h2>
<p>This is some random paragraph without child tags.</p>
<p>Delicious homebaked pizza.<br><em></em>$8.99 pp</em></p>
<h2>Eggplant Parmesan</h2>
<p>Try the authentic <i>Italian flavor</i> of baked aubergine.<br><em>$6.99 pp</em></p>
<h2>Italian Ice Cream</h2>
<p>Our dessert specialty.<br><em>$3.99 pp</em></p>
</body>
</html>
Using BeautifulSoup, I want to grab the text that is displayed for the h2
and p
tags, replace them with a prefixed version in the tree, and also print them out on screen. For the h2
tags, this works fine:
from bs4 import BeautifulSoup
with open("/var/www/html/Test/index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("/var/www/html/Test/I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
f.write(str(soup))
###Output:##############################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
########################################################
In my I18N_index.html, all 3 strings appear correctly prefixed with 'I18N_'.
However, my p
tags contain child tags, and for these the return type is 'None'. As a result, the concatenation no longer works:
for p in soup.find_all('p'):
i18n_string = "I18N_"+p.string
p.string.replace_with(i18n_string)
print(p.string)
f.write(str(soup))
###Output:##################################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# Traceback (most recent call last):
# File "./test.py", line 15, in <module>
# i18n_string = "I18N_"+p.string
# TypeError: cannot concatenate 'str' and 'NoneType' objects
############################################################
From this thread I learned about the join
function. It let's me do the concatenation and print out the resulting strings on screen, but not the replacement in the soup tree:
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
#joined.replace_with(i18n_string)
print (i18n_string)
###Output with 'joined.replace_with(i18n_string)' DISABLED:###
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# I18N_Delicious homebaked pizza.$8.99 pp
# I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
# I18N_Our dessert specialty$3.99 pp
############################################################
###Output with 'joined.replace_with(i18n_string)' ENABLED:#####
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# Traceback (most recent call last):
# File "./test.py", line 41, in <module>
# joined.replace_with(i18n_string)
# AttributeError: 'unicode' object has no attribute 'replace_with'
############################################################
In that thread, another solution based on isinstance
is mentioned, but I could not make that work.
If I understand correctly, the join function joins the strings but returns a 'unicode' object, not a string object, and this is why the 'replace_with' attribute doesn't work. How can I work around this? Any help is much appreciated.
python beautifulsoup
New contributor
add a comment |
Here is my html
:
<html>
<body>
<h2>Pizza</h2>
<p>This is some random paragraph without child tags.</p>
<p>Delicious homebaked pizza.<br><em></em>$8.99 pp</em></p>
<h2>Eggplant Parmesan</h2>
<p>Try the authentic <i>Italian flavor</i> of baked aubergine.<br><em>$6.99 pp</em></p>
<h2>Italian Ice Cream</h2>
<p>Our dessert specialty.<br><em>$3.99 pp</em></p>
</body>
</html>
Using BeautifulSoup, I want to grab the text that is displayed for the h2
and p
tags, replace them with a prefixed version in the tree, and also print them out on screen. For the h2
tags, this works fine:
from bs4 import BeautifulSoup
with open("/var/www/html/Test/index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("/var/www/html/Test/I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
f.write(str(soup))
###Output:##############################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
########################################################
In my I18N_index.html, all 3 strings appear correctly prefixed with 'I18N_'.
However, my p
tags contain child tags, and for these the return type is 'None'. As a result, the concatenation no longer works:
for p in soup.find_all('p'):
i18n_string = "I18N_"+p.string
p.string.replace_with(i18n_string)
print(p.string)
f.write(str(soup))
###Output:##################################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# Traceback (most recent call last):
# File "./test.py", line 15, in <module>
# i18n_string = "I18N_"+p.string
# TypeError: cannot concatenate 'str' and 'NoneType' objects
############################################################
From this thread I learned about the join
function. It let's me do the concatenation and print out the resulting strings on screen, but not the replacement in the soup tree:
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
#joined.replace_with(i18n_string)
print (i18n_string)
###Output with 'joined.replace_with(i18n_string)' DISABLED:###
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# I18N_Delicious homebaked pizza.$8.99 pp
# I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
# I18N_Our dessert specialty$3.99 pp
############################################################
###Output with 'joined.replace_with(i18n_string)' ENABLED:#####
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# Traceback (most recent call last):
# File "./test.py", line 41, in <module>
# joined.replace_with(i18n_string)
# AttributeError: 'unicode' object has no attribute 'replace_with'
############################################################
In that thread, another solution based on isinstance
is mentioned, but I could not make that work.
If I understand correctly, the join function joins the strings but returns a 'unicode' object, not a string object, and this is why the 'replace_with' attribute doesn't work. How can I work around this? Any help is much appreciated.
python beautifulsoup
New contributor
Here is my html
:
<html>
<body>
<h2>Pizza</h2>
<p>This is some random paragraph without child tags.</p>
<p>Delicious homebaked pizza.<br><em></em>$8.99 pp</em></p>
<h2>Eggplant Parmesan</h2>
<p>Try the authentic <i>Italian flavor</i> of baked aubergine.<br><em>$6.99 pp</em></p>
<h2>Italian Ice Cream</h2>
<p>Our dessert specialty.<br><em>$3.99 pp</em></p>
</body>
</html>
Using BeautifulSoup, I want to grab the text that is displayed for the h2
and p
tags, replace them with a prefixed version in the tree, and also print them out on screen. For the h2
tags, this works fine:
from bs4 import BeautifulSoup
with open("/var/www/html/Test/index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("/var/www/html/Test/I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
f.write(str(soup))
###Output:##############################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
########################################################
In my I18N_index.html, all 3 strings appear correctly prefixed with 'I18N_'.
However, my p
tags contain child tags, and for these the return type is 'None'. As a result, the concatenation no longer works:
for p in soup.find_all('p'):
i18n_string = "I18N_"+p.string
p.string.replace_with(i18n_string)
print(p.string)
f.write(str(soup))
###Output:##################################################
# $ python ./test.py
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# Traceback (most recent call last):
# File "./test.py", line 15, in <module>
# i18n_string = "I18N_"+p.string
# TypeError: cannot concatenate 'str' and 'NoneType' objects
############################################################
From this thread I learned about the join
function. It let's me do the concatenation and print out the resulting strings on screen, but not the replacement in the soup tree:
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
#joined.replace_with(i18n_string)
print (i18n_string)
###Output with 'joined.replace_with(i18n_string)' DISABLED:###
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# I18N_This is some random paragraph without child tags.
# I18N_Delicious homebaked pizza.$8.99 pp
# I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
# I18N_Our dessert specialty$3.99 pp
############################################################
###Output with 'joined.replace_with(i18n_string)' ENABLED:#####
# I18N_Pizza
# I18N_Eggplant Parmesan
# I18N_Italian Ice Cream
# Traceback (most recent call last):
# File "./test.py", line 41, in <module>
# joined.replace_with(i18n_string)
# AttributeError: 'unicode' object has no attribute 'replace_with'
############################################################
In that thread, another solution based on isinstance
is mentioned, but I could not make that work.
If I understand correctly, the join function joins the strings but returns a 'unicode' object, not a string object, and this is why the 'replace_with' attribute doesn't work. How can I work around this? Any help is much appreciated.
python beautifulsoup
python beautifulsoup
New contributor
New contributor
New contributor
asked yesterday
cbpcbp
205
205
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
replace_with()
method does not work not because joined
is a unicode object, but because it is a method specific to bs4 object. See this: BeautifulSoup-replace_with
By the way the join()
method return a str
See this: python3-join
Now to give you a solution, I would simply remove the string
after the p
tag:
from bs4 import BeautifulSoup
with open("index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
p.replace_with(i18n_string)
print (i18n_string)
f.write(str(soup))
OUTPUT:
I18N_Pizza
I18N_Eggplant Parmesan
I18N_Italian Ice Cream
I18N_This is some random paragraph without child tags.
I18N_Delicious homebaked pizza.$8.99 pp
I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
I18N_Our dessert specialty.$3.99 pp
This solution works. Thanks a lot, also for the additional information.
– cbp
yesterday
You are welcome :-)
– Maaz
yesterday
add a comment |
With a simplified version of your code (that is, just taking care of the p
tags issue), it looks like you have to replace p.string
with p.text
:
soup = BeautifulSoup([your html], "lxml")
for p in soup.find_all('p'):
print('before: ',p.text)
i18n_string = "I18N_"+p.text
print('after ',i18n_string)
Output:
before: This is some random paragraph without child tags.
after I18N_This is some random paragraph without child tags.
before: Delicious homebaked pizza.$8.99 pp
after I18N_Delicious homebaked pizza.$8.99 pp
before: Try the authentic Italian flavor of baked aubergine.$6.99 pp
after I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before: Our dessert specialty.$3.99 pp
after I18N_Our dessert specialty.$3.99 pp
Thanks for your reply. I had tried 'text' before, but it did not resolve my inability to use 'replace_with'.
– cbp
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
cbp is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55023173%2fhow-to-make-beautifulsoup-replace-with-attribute-work-with-a-unicode-object%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
replace_with()
method does not work not because joined
is a unicode object, but because it is a method specific to bs4 object. See this: BeautifulSoup-replace_with
By the way the join()
method return a str
See this: python3-join
Now to give you a solution, I would simply remove the string
after the p
tag:
from bs4 import BeautifulSoup
with open("index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
p.replace_with(i18n_string)
print (i18n_string)
f.write(str(soup))
OUTPUT:
I18N_Pizza
I18N_Eggplant Parmesan
I18N_Italian Ice Cream
I18N_This is some random paragraph without child tags.
I18N_Delicious homebaked pizza.$8.99 pp
I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
I18N_Our dessert specialty.$3.99 pp
This solution works. Thanks a lot, also for the additional information.
– cbp
yesterday
You are welcome :-)
– Maaz
yesterday
add a comment |
replace_with()
method does not work not because joined
is a unicode object, but because it is a method specific to bs4 object. See this: BeautifulSoup-replace_with
By the way the join()
method return a str
See this: python3-join
Now to give you a solution, I would simply remove the string
after the p
tag:
from bs4 import BeautifulSoup
with open("index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
p.replace_with(i18n_string)
print (i18n_string)
f.write(str(soup))
OUTPUT:
I18N_Pizza
I18N_Eggplant Parmesan
I18N_Italian Ice Cream
I18N_This is some random paragraph without child tags.
I18N_Delicious homebaked pizza.$8.99 pp
I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
I18N_Our dessert specialty.$3.99 pp
This solution works. Thanks a lot, also for the additional information.
– cbp
yesterday
You are welcome :-)
– Maaz
yesterday
add a comment |
replace_with()
method does not work not because joined
is a unicode object, but because it is a method specific to bs4 object. See this: BeautifulSoup-replace_with
By the way the join()
method return a str
See this: python3-join
Now to give you a solution, I would simply remove the string
after the p
tag:
from bs4 import BeautifulSoup
with open("index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
p.replace_with(i18n_string)
print (i18n_string)
f.write(str(soup))
OUTPUT:
I18N_Pizza
I18N_Eggplant Parmesan
I18N_Italian Ice Cream
I18N_This is some random paragraph without child tags.
I18N_Delicious homebaked pizza.$8.99 pp
I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
I18N_Our dessert specialty.$3.99 pp
replace_with()
method does not work not because joined
is a unicode object, but because it is a method specific to bs4 object. See this: BeautifulSoup-replace_with
By the way the join()
method return a str
See this: python3-join
Now to give you a solution, I would simply remove the string
after the p
tag:
from bs4 import BeautifulSoup
with open("index.html", "r") as f:
soup = BeautifulSoup(f, "lxml")
f = open("I18N_index.html", "w+")
for h2 in soup.find_all('h2'):
i18n_string = "I18N_"+h2.string
h2.string.replace_with(i18n_string)
print(h2.string)
for p in soup.find_all('p'):
joined = ''.join(p.strings)
i18n_string = "I18N_"+joined
p.replace_with(i18n_string)
print (i18n_string)
f.write(str(soup))
OUTPUT:
I18N_Pizza
I18N_Eggplant Parmesan
I18N_Italian Ice Cream
I18N_This is some random paragraph without child tags.
I18N_Delicious homebaked pizza.$8.99 pp
I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
I18N_Our dessert specialty.$3.99 pp
answered yesterday
MaazMaaz
359211
359211
This solution works. Thanks a lot, also for the additional information.
– cbp
yesterday
You are welcome :-)
– Maaz
yesterday
add a comment |
This solution works. Thanks a lot, also for the additional information.
– cbp
yesterday
You are welcome :-)
– Maaz
yesterday
This solution works. Thanks a lot, also for the additional information.
– cbp
yesterday
This solution works. Thanks a lot, also for the additional information.
– cbp
yesterday
You are welcome :-)
– Maaz
yesterday
You are welcome :-)
– Maaz
yesterday
add a comment |
With a simplified version of your code (that is, just taking care of the p
tags issue), it looks like you have to replace p.string
with p.text
:
soup = BeautifulSoup([your html], "lxml")
for p in soup.find_all('p'):
print('before: ',p.text)
i18n_string = "I18N_"+p.text
print('after ',i18n_string)
Output:
before: This is some random paragraph without child tags.
after I18N_This is some random paragraph without child tags.
before: Delicious homebaked pizza.$8.99 pp
after I18N_Delicious homebaked pizza.$8.99 pp
before: Try the authentic Italian flavor of baked aubergine.$6.99 pp
after I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before: Our dessert specialty.$3.99 pp
after I18N_Our dessert specialty.$3.99 pp
Thanks for your reply. I had tried 'text' before, but it did not resolve my inability to use 'replace_with'.
– cbp
yesterday
add a comment |
With a simplified version of your code (that is, just taking care of the p
tags issue), it looks like you have to replace p.string
with p.text
:
soup = BeautifulSoup([your html], "lxml")
for p in soup.find_all('p'):
print('before: ',p.text)
i18n_string = "I18N_"+p.text
print('after ',i18n_string)
Output:
before: This is some random paragraph without child tags.
after I18N_This is some random paragraph without child tags.
before: Delicious homebaked pizza.$8.99 pp
after I18N_Delicious homebaked pizza.$8.99 pp
before: Try the authentic Italian flavor of baked aubergine.$6.99 pp
after I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before: Our dessert specialty.$3.99 pp
after I18N_Our dessert specialty.$3.99 pp
Thanks for your reply. I had tried 'text' before, but it did not resolve my inability to use 'replace_with'.
– cbp
yesterday
add a comment |
With a simplified version of your code (that is, just taking care of the p
tags issue), it looks like you have to replace p.string
with p.text
:
soup = BeautifulSoup([your html], "lxml")
for p in soup.find_all('p'):
print('before: ',p.text)
i18n_string = "I18N_"+p.text
print('after ',i18n_string)
Output:
before: This is some random paragraph without child tags.
after I18N_This is some random paragraph without child tags.
before: Delicious homebaked pizza.$8.99 pp
after I18N_Delicious homebaked pizza.$8.99 pp
before: Try the authentic Italian flavor of baked aubergine.$6.99 pp
after I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before: Our dessert specialty.$3.99 pp
after I18N_Our dessert specialty.$3.99 pp
With a simplified version of your code (that is, just taking care of the p
tags issue), it looks like you have to replace p.string
with p.text
:
soup = BeautifulSoup([your html], "lxml")
for p in soup.find_all('p'):
print('before: ',p.text)
i18n_string = "I18N_"+p.text
print('after ',i18n_string)
Output:
before: This is some random paragraph without child tags.
after I18N_This is some random paragraph without child tags.
before: Delicious homebaked pizza.$8.99 pp
after I18N_Delicious homebaked pizza.$8.99 pp
before: Try the authentic Italian flavor of baked aubergine.$6.99 pp
after I18N_Try the authentic Italian flavor of baked aubergine.$6.99 pp
before: Our dessert specialty.$3.99 pp
after I18N_Our dessert specialty.$3.99 pp
edited yesterday
answered yesterday
Jack FleetingJack Fleeting
397311
397311
Thanks for your reply. I had tried 'text' before, but it did not resolve my inability to use 'replace_with'.
– cbp
yesterday
add a comment |
Thanks for your reply. I had tried 'text' before, but it did not resolve my inability to use 'replace_with'.
– cbp
yesterday
Thanks for your reply. I had tried 'text' before, but it did not resolve my inability to use 'replace_with'.
– cbp
yesterday
Thanks for your reply. I had tried 'text' before, but it did not resolve my inability to use 'replace_with'.
– cbp
yesterday
add a comment |
cbp is a new contributor. Be nice, and check out our Code of Conduct.
cbp is a new contributor. Be nice, and check out our Code of Conduct.
cbp is a new contributor. Be nice, and check out our Code of Conduct.
cbp is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55023173%2fhow-to-make-beautifulsoup-replace-with-attribute-work-with-a-unicode-object%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown