What regex can match similar characters? [duplicate]2019 Community Moderator ElectionConverting Symbols, Accent Letters to English AlphabetWhat is reflection and why is it useful?What is the difference between public, protected, package-private and private in Java?What is a serialVersionUID and why should I use it?Regular expression to match a line that doesn't contain a word?Why is the Android emulator so slow? How can we speed up the Android emulator?RegEx match open tags except XHTML self-contained tagsWhat is the difference between “px”, “dip”, “dp” and “sp”?What is a non-capturing group? What does (?:) do?What is 'Context' on Android?How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Recruiter wants very extensive technical details about all of my previous work
How to explain that I do not want to visit a country due to personal safety concern?
How to change two letters closest to a string and one letter immediately after a string using notepad++
how to draw discrete time diagram in tikz
What approach do we need to follow for projects without a test environment?
How to deal with a cynical class?
How could a scammer know the apps on my phone / iTunes account?
What are substitutions for coconut in curry?
Is it true that good novels will automatically sell themselves on Amazon (and so on) and there is no need for one to waste time promoting?
How to use of "the" before known matrices
Is it possible to upcast ritual spells?
Why did it take so long to abandon sail after steamships were demonstrated?
Are all passive ability checks floors for active ability checks?
Employee lack of ownership
What exactly is this small puffer fish doing and how did it manage to accomplish such a feat?
Is there a data structure that only stores hash codes and not the actual objects?
What's the meaning of “spike” in the context of “adrenaline spike”?
How to create the Curved texte?
What did Alexander Pope mean by "Expletives their feeble Aid do join"?
If curse and magic is two sides of the same coin, why the former is forbidden?
Instead of Universal Basic Income, why not Universal Basic NEEDS?
It's a yearly task, alright
How can I track script which gives me "command not found" right after the login?
Interplanetary conflict, some disease destroys the ability to understand or appreciate music
What regex can match similar characters? [duplicate]
2019 Community Moderator ElectionConverting Symbols, Accent Letters to English AlphabetWhat is reflection and why is it useful?What is the difference between public, protected, package-private and private in Java?What is a serialVersionUID and why should I use it?Regular expression to match a line that doesn't contain a word?Why is the Android emulator so slow? How can we speed up the Android emulator?RegEx match open tags except XHTML self-contained tagsWhat is the difference between “px”, “dip”, “dp” and “sp”?What is a non-capturing group? What does (?:) do?What is 'Context' on Android?How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
This question already has an answer here:
Converting Symbols, Accent Letters to English Alphabet
12 answers
What regex could match similar characters, like (ä and a) or in Russian (и and й)?
Below my code...
Sting text1 = " Passagiere noch auf ihr fehlendes Gepäck"
Sting text2 = " Passagiere noch auf ihr fehlendes Gepack"
Pattern p1 = Pattern.compile("\b" + "Gepack");
Pattern p2 = Pattern.compile("\b" + "Gepack");
Matcher m1 = p1.matcher(text1); // doesn't find any occurrence
Matcher m2 = p2.matcher(text2) // founds one occurrence
java
marked as duplicate by Wiktor Stribiżew
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Mar 7 at 14:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Converting Symbols, Accent Letters to English Alphabet
12 answers
What regex could match similar characters, like (ä and a) or in Russian (и and й)?
Below my code...
Sting text1 = " Passagiere noch auf ihr fehlendes Gepäck"
Sting text2 = " Passagiere noch auf ihr fehlendes Gepack"
Pattern p1 = Pattern.compile("\b" + "Gepack");
Pattern p2 = Pattern.compile("\b" + "Gepack");
Matcher m1 = p1.matcher(text1); // doesn't find any occurrence
Matcher m2 = p2.matcher(text2) // founds one occurrence
java
marked as duplicate by Wiktor Stribiżew
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Mar 7 at 14:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Not sure this is the right duplicate as the linked to article is more about transliteration than normalisation.
– JGNI
Mar 7 at 14:41
add a comment |
This question already has an answer here:
Converting Symbols, Accent Letters to English Alphabet
12 answers
What regex could match similar characters, like (ä and a) or in Russian (и and й)?
Below my code...
Sting text1 = " Passagiere noch auf ihr fehlendes Gepäck"
Sting text2 = " Passagiere noch auf ihr fehlendes Gepack"
Pattern p1 = Pattern.compile("\b" + "Gepack");
Pattern p2 = Pattern.compile("\b" + "Gepack");
Matcher m1 = p1.matcher(text1); // doesn't find any occurrence
Matcher m2 = p2.matcher(text2) // founds one occurrence
java
This question already has an answer here:
Converting Symbols, Accent Letters to English Alphabet
12 answers
What regex could match similar characters, like (ä and a) or in Russian (и and й)?
Below my code...
Sting text1 = " Passagiere noch auf ihr fehlendes Gepäck"
Sting text2 = " Passagiere noch auf ihr fehlendes Gepack"
Pattern p1 = Pattern.compile("\b" + "Gepack");
Pattern p2 = Pattern.compile("\b" + "Gepack");
Matcher m1 = p1.matcher(text1); // doesn't find any occurrence
Matcher m2 = p2.matcher(text2) // founds one occurrence
This question already has an answer here:
Converting Symbols, Accent Letters to English Alphabet
12 answers
java
java
edited Mar 7 at 14:01
hamid
asked Mar 7 at 14:00
hamidhamid
2719
2719
marked as duplicate by Wiktor Stribiżew
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Mar 7 at 14:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Wiktor Stribiżew
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Mar 7 at 14:34
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Not sure this is the right duplicate as the linked to article is more about transliteration than normalisation.
– JGNI
Mar 7 at 14:41
add a comment |
Not sure this is the right duplicate as the linked to article is more about transliteration than normalisation.
– JGNI
Mar 7 at 14:41
Not sure this is the right duplicate as the linked to article is more about transliteration than normalisation.
– JGNI
Mar 7 at 14:41
Not sure this is the right duplicate as the linked to article is more about transliteration than normalisation.
– JGNI
Mar 7 at 14:41
add a comment |
1 Answer
1
active
oldest
votes
You could build up a character class of all the characters you want to match so you could replace pattern one with
Pattern p1 = Pattern.compile("\b" + "Gep[aä]ck");
But this could get very burdensome very quickly
There is a mechanism in Unicode called Normalisation, see here for details, that lets you reformat your string to compare in different ways.
Normalisation Form Canonical Decomposition (NFD) takes a string containing accented character code points and creates multiple code points, starting with the base character and then with code points cosponsoring to combining character versions of the accents in a well defined order for each accented character.
Having done this to your input you can use a regex to remove all the accents from the string as they will all have the Unicode property Mark, sometimes shortened to M.
This gives you a string containing only base characters that your regex will match against.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could build up a character class of all the characters you want to match so you could replace pattern one with
Pattern p1 = Pattern.compile("\b" + "Gep[aä]ck");
But this could get very burdensome very quickly
There is a mechanism in Unicode called Normalisation, see here for details, that lets you reformat your string to compare in different ways.
Normalisation Form Canonical Decomposition (NFD) takes a string containing accented character code points and creates multiple code points, starting with the base character and then with code points cosponsoring to combining character versions of the accents in a well defined order for each accented character.
Having done this to your input you can use a regex to remove all the accents from the string as they will all have the Unicode property Mark, sometimes shortened to M.
This gives you a string containing only base characters that your regex will match against.
add a comment |
You could build up a character class of all the characters you want to match so you could replace pattern one with
Pattern p1 = Pattern.compile("\b" + "Gep[aä]ck");
But this could get very burdensome very quickly
There is a mechanism in Unicode called Normalisation, see here for details, that lets you reformat your string to compare in different ways.
Normalisation Form Canonical Decomposition (NFD) takes a string containing accented character code points and creates multiple code points, starting with the base character and then with code points cosponsoring to combining character versions of the accents in a well defined order for each accented character.
Having done this to your input you can use a regex to remove all the accents from the string as they will all have the Unicode property Mark, sometimes shortened to M.
This gives you a string containing only base characters that your regex will match against.
add a comment |
You could build up a character class of all the characters you want to match so you could replace pattern one with
Pattern p1 = Pattern.compile("\b" + "Gep[aä]ck");
But this could get very burdensome very quickly
There is a mechanism in Unicode called Normalisation, see here for details, that lets you reformat your string to compare in different ways.
Normalisation Form Canonical Decomposition (NFD) takes a string containing accented character code points and creates multiple code points, starting with the base character and then with code points cosponsoring to combining character versions of the accents in a well defined order for each accented character.
Having done this to your input you can use a regex to remove all the accents from the string as they will all have the Unicode property Mark, sometimes shortened to M.
This gives you a string containing only base characters that your regex will match against.
You could build up a character class of all the characters you want to match so you could replace pattern one with
Pattern p1 = Pattern.compile("\b" + "Gep[aä]ck");
But this could get very burdensome very quickly
There is a mechanism in Unicode called Normalisation, see here for details, that lets you reformat your string to compare in different ways.
Normalisation Form Canonical Decomposition (NFD) takes a string containing accented character code points and creates multiple code points, starting with the base character and then with code points cosponsoring to combining character versions of the accents in a well defined order for each accented character.
Having done this to your input you can use a regex to remove all the accents from the string as they will all have the Unicode property Mark, sometimes shortened to M.
This gives you a string containing only base characters that your regex will match against.
answered Mar 7 at 14:28
JGNIJGNI
2,541718
2,541718
add a comment |
add a comment |
Not sure this is the right duplicate as the linked to article is more about transliteration than normalisation.
– JGNI
Mar 7 at 14:41