Android - How to filter emoji (emoticons) from a string?2019 Community Moderator Electionremoving characters of a specific unicode range from a stringCheck if letter is emojiCheck what user send emoji to telegram botAndroid - How to filter unstoreable characters from a string?How can I strip undisplayable characters from a string with PHP?How to judge a string contains emoji character in java,and how to filter itHow do save an Android Activity state using save instance state?Disable landscape mode in Android?How do I obtain crash-data from my Android application?Why is the Android emulator so slow? How can we speed up the Android emulator?Stop EditText from gaining focus at Activity startupIs quitting an application frowned upon?How do you install an APK file in the Android emulator?Get current time and date on AndroidRename package in Android StudioRemove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java string
Do native speakers use "ultima" and "proxima" frequently in spoken English?
Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?
Deletion of copy-ctor & copy-assignment - public, private or protected?
Can a medieval gyroplane be built?
I got the following comment from a reputed math journal. What does it mean?
If "dar" means "to give", what does "daros" mean?
Brake pads destroying wheels
How could an airship be repaired midflight?
Maths symbols and unicode-math input inside siunitx commands
Loading the leaflet Map in Lightning Web Component
Is it possible to stack the damage done by the Absorb Elements spell?
Does the attack bonus from a Masterwork weapon stack with the attack bonus from Masterwork ammunition?
Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?
A Ri-diddley-iley Riddle
Can you move over difficult terrain with only 5 feet of movement?
Writing in a Christian voice
What are substitutions for coconut in curry?
What is the significance behind "40 days" that often appears in the Bible?
Do US professors/group leaders only get a salary, but no group budget?
How to terminate ping <dest> &
In what cases must I use 了 and in what cases not?
Would it be believable to defy demographics in a story?
Synchronized implementation of a bank account in Java
Knife as defense against stray dogs
Android - How to filter emoji (emoticons) from a string?
2019 Community Moderator Electionremoving characters of a specific unicode range from a stringCheck if letter is emojiCheck what user send emoji to telegram botAndroid - How to filter unstoreable characters from a string?How can I strip undisplayable characters from a string with PHP?How to judge a string contains emoji character in java,and how to filter itHow do save an Android Activity state using save instance state?Disable landscape mode in Android?How do I obtain crash-data from my Android application?Why is the Android emulator so slow? How can we speed up the Android emulator?Stop EditText from gaining focus at Activity startupIs quitting an application frowned upon?How do you install an APK file in the Android emulator?Get current time and date on AndroidRename package in Android StudioRemove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java string
I'm working on an Android app, and I do not want people to use emoji in the input.
How can I remove emoji characters from a string?
android emoji
add a comment |
I'm working on an Android app, and I do not want people to use emoji in the input.
How can I remove emoji characters from a string?
android emoji
Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.
– Justin C
Mar 4 '14 at 17:10
1
See stackoverflow.com/questions/12013341/…
– Sujen
Mar 4 '14 at 17:10
You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…
– user2474486
Dec 14 '16 at 16:31
@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.
– Jochem Kuijpers
Dec 15 '16 at 2:34
add a comment |
I'm working on an Android app, and I do not want people to use emoji in the input.
How can I remove emoji characters from a string?
android emoji
I'm working on an Android app, and I do not want people to use emoji in the input.
How can I remove emoji characters from a string?
android emoji
android emoji
asked Mar 4 '14 at 17:06
Jochem KuijpersJochem Kuijpers
98731228
98731228
Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.
– Justin C
Mar 4 '14 at 17:10
1
See stackoverflow.com/questions/12013341/…
– Sujen
Mar 4 '14 at 17:10
You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…
– user2474486
Dec 14 '16 at 16:31
@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.
– Jochem Kuijpers
Dec 15 '16 at 2:34
add a comment |
Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.
– Justin C
Mar 4 '14 at 17:10
1
See stackoverflow.com/questions/12013341/…
– Sujen
Mar 4 '14 at 17:10
You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…
– user2474486
Dec 14 '16 at 16:31
@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.
– Jochem Kuijpers
Dec 15 '16 at 2:34
Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.
– Justin C
Mar 4 '14 at 17:10
Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.
– Justin C
Mar 4 '14 at 17:10
1
1
See stackoverflow.com/questions/12013341/…
– Sujen
Mar 4 '14 at 17:10
See stackoverflow.com/questions/12013341/…
– Sujen
Mar 4 '14 at 17:10
You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…
– user2474486
Dec 14 '16 at 16:31
You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…
– user2474486
Dec 14 '16 at 16:31
@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.
– Jochem Kuijpers
Dec 15 '16 at 2:34
@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.
– Jochem Kuijpers
Dec 15 '16 at 2:34
add a comment |
3 Answers
3
active
oldest
votes
Emojis can be found in the following ranges (source) :
- U+2190 to U+21FF
- U+2600 to U+26FF
- U+2700 to U+27BF
- U+3000 to U+303F
- U+1F300 to U+1F64F
- U+1F680 to U+1F6FF
You can use this line in your script to filter them all at once:
text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");
this is one potential answer but does not handle all cases. But nonetheless
– user210504
Jun 7 '14 at 1:50
13
@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.
– Carl Anderson
Sep 26 '14 at 18:02
Not working on Xperia Z 4.4
– JiTHiN
Sep 10 '15 at 20:44
1
u expects 4 digits -- how is this supposed to work for 1f300 etc?
– Stefan Haustein
Apr 24 '17 at 23:28
2
Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);
– Aviv Mor
Dec 3 '17 at 18:56
add a comment |
Latest emoji data can be found here:
http://unicode.org/Public/emoji/
There is a folder named with emoji version.
As app developers a good idea is to use latest version available.
When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.
There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.
Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.
For example, we have a string.
String s = ...;
UTF-16 representation
byte[] utf16 = s.getBytes("UTF-16BE");
Iterate over UTF-16
for(int i = 0; i < utf16.length; i += 2) {
Get one char
char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));
Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.
if(c >= 0xd800 && c <= 0xd83f)
high = c;
continue;
For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.
else if(c >= 0xdc00 && c <= 0xdfff)
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;
All other symbols are not pairs so process them as is.
else
long unicode = c;
Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.
Finally byte array is converted to String by
String out = new String(outarray, Charset.forName("UTF-16BE"));
P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php
– NoAngel
Oct 3 '17 at 3:39
add a comment |
Here is what I use to remove emojis. Note: This only works on API 24 and forwards
public String remove_Emojis_For_Devices_API_24_Onwards(String name)
// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();
// this is where we will store the reasembled name
String newName = "";
//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it
// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);
return newName;
so if we pass in a string:
remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");
it returns: test Indic:ढ Japanese:な Korean:ㅂ
Emoji placement or count doesn't matter
Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.
– Jenix
Aug 31 '18 at 0:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22178349%2fandroid-how-to-filter-emoji-emoticons-from-a-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Emojis can be found in the following ranges (source) :
- U+2190 to U+21FF
- U+2600 to U+26FF
- U+2700 to U+27BF
- U+3000 to U+303F
- U+1F300 to U+1F64F
- U+1F680 to U+1F6FF
You can use this line in your script to filter them all at once:
text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");
this is one potential answer but does not handle all cases. But nonetheless
– user210504
Jun 7 '14 at 1:50
13
@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.
– Carl Anderson
Sep 26 '14 at 18:02
Not working on Xperia Z 4.4
– JiTHiN
Sep 10 '15 at 20:44
1
u expects 4 digits -- how is this supposed to work for 1f300 etc?
– Stefan Haustein
Apr 24 '17 at 23:28
2
Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);
– Aviv Mor
Dec 3 '17 at 18:56
add a comment |
Emojis can be found in the following ranges (source) :
- U+2190 to U+21FF
- U+2600 to U+26FF
- U+2700 to U+27BF
- U+3000 to U+303F
- U+1F300 to U+1F64F
- U+1F680 to U+1F6FF
You can use this line in your script to filter them all at once:
text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");
this is one potential answer but does not handle all cases. But nonetheless
– user210504
Jun 7 '14 at 1:50
13
@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.
– Carl Anderson
Sep 26 '14 at 18:02
Not working on Xperia Z 4.4
– JiTHiN
Sep 10 '15 at 20:44
1
u expects 4 digits -- how is this supposed to work for 1f300 etc?
– Stefan Haustein
Apr 24 '17 at 23:28
2
Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);
– Aviv Mor
Dec 3 '17 at 18:56
add a comment |
Emojis can be found in the following ranges (source) :
- U+2190 to U+21FF
- U+2600 to U+26FF
- U+2700 to U+27BF
- U+3000 to U+303F
- U+1F300 to U+1F64F
- U+1F680 to U+1F6FF
You can use this line in your script to filter them all at once:
text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");
Emojis can be found in the following ranges (source) :
- U+2190 to U+21FF
- U+2600 to U+26FF
- U+2700 to U+27BF
- U+3000 to U+303F
- U+1F300 to U+1F64F
- U+1F680 to U+1F6FF
You can use this line in your script to filter them all at once:
text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");
edited Sep 1 '15 at 11:42
JREN
2,84822142
2,84822142
answered Mar 16 '14 at 10:27
Faez MehrabaniFaez Mehrabani
26024
26024
this is one potential answer but does not handle all cases. But nonetheless
– user210504
Jun 7 '14 at 1:50
13
@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.
– Carl Anderson
Sep 26 '14 at 18:02
Not working on Xperia Z 4.4
– JiTHiN
Sep 10 '15 at 20:44
1
u expects 4 digits -- how is this supposed to work for 1f300 etc?
– Stefan Haustein
Apr 24 '17 at 23:28
2
Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);
– Aviv Mor
Dec 3 '17 at 18:56
add a comment |
this is one potential answer but does not handle all cases. But nonetheless
– user210504
Jun 7 '14 at 1:50
13
@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.
– Carl Anderson
Sep 26 '14 at 18:02
Not working on Xperia Z 4.4
– JiTHiN
Sep 10 '15 at 20:44
1
u expects 4 digits -- how is this supposed to work for 1f300 etc?
– Stefan Haustein
Apr 24 '17 at 23:28
2
Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);
– Aviv Mor
Dec 3 '17 at 18:56
this is one potential answer but does not handle all cases. But nonetheless
– user210504
Jun 7 '14 at 1:50
this is one potential answer but does not handle all cases. But nonetheless
– user210504
Jun 7 '14 at 1:50
13
13
@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.
– Carl Anderson
Sep 26 '14 at 18:02
@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.
– Carl Anderson
Sep 26 '14 at 18:02
Not working on Xperia Z 4.4
– JiTHiN
Sep 10 '15 at 20:44
Not working on Xperia Z 4.4
– JiTHiN
Sep 10 '15 at 20:44
1
1
u expects 4 digits -- how is this supposed to work for 1f300 etc?
– Stefan Haustein
Apr 24 '17 at 23:28
u expects 4 digits -- how is this supposed to work for 1f300 etc?
– Stefan Haustein
Apr 24 '17 at 23:28
2
2
Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);
– Aviv Mor
Dec 3 '17 at 18:56
Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);
– Aviv Mor
Dec 3 '17 at 18:56
add a comment |
Latest emoji data can be found here:
http://unicode.org/Public/emoji/
There is a folder named with emoji version.
As app developers a good idea is to use latest version available.
When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.
There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.
Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.
For example, we have a string.
String s = ...;
UTF-16 representation
byte[] utf16 = s.getBytes("UTF-16BE");
Iterate over UTF-16
for(int i = 0; i < utf16.length; i += 2) {
Get one char
char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));
Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.
if(c >= 0xd800 && c <= 0xd83f)
high = c;
continue;
For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.
else if(c >= 0xdc00 && c <= 0xdfff)
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;
All other symbols are not pairs so process them as is.
else
long unicode = c;
Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.
Finally byte array is converted to String by
String out = new String(outarray, Charset.forName("UTF-16BE"));
P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php
– NoAngel
Oct 3 '17 at 3:39
add a comment |
Latest emoji data can be found here:
http://unicode.org/Public/emoji/
There is a folder named with emoji version.
As app developers a good idea is to use latest version available.
When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.
There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.
Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.
For example, we have a string.
String s = ...;
UTF-16 representation
byte[] utf16 = s.getBytes("UTF-16BE");
Iterate over UTF-16
for(int i = 0; i < utf16.length; i += 2) {
Get one char
char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));
Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.
if(c >= 0xd800 && c <= 0xd83f)
high = c;
continue;
For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.
else if(c >= 0xdc00 && c <= 0xdfff)
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;
All other symbols are not pairs so process them as is.
else
long unicode = c;
Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.
Finally byte array is converted to String by
String out = new String(outarray, Charset.forName("UTF-16BE"));
P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php
– NoAngel
Oct 3 '17 at 3:39
add a comment |
Latest emoji data can be found here:
http://unicode.org/Public/emoji/
There is a folder named with emoji version.
As app developers a good idea is to use latest version available.
When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.
There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.
Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.
For example, we have a string.
String s = ...;
UTF-16 representation
byte[] utf16 = s.getBytes("UTF-16BE");
Iterate over UTF-16
for(int i = 0; i < utf16.length; i += 2) {
Get one char
char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));
Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.
if(c >= 0xd800 && c <= 0xd83f)
high = c;
continue;
For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.
else if(c >= 0xdc00 && c <= 0xdfff)
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;
All other symbols are not pairs so process them as is.
else
long unicode = c;
Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.
Finally byte array is converted to String by
String out = new String(outarray, Charset.forName("UTF-16BE"));
Latest emoji data can be found here:
http://unicode.org/Public/emoji/
There is a folder named with emoji version.
As app developers a good idea is to use latest version available.
When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.
There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.
Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.
For example, we have a string.
String s = ...;
UTF-16 representation
byte[] utf16 = s.getBytes("UTF-16BE");
Iterate over UTF-16
for(int i = 0; i < utf16.length; i += 2) {
Get one char
char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));
Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.
if(c >= 0xd800 && c <= 0xd83f)
high = c;
continue;
For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.
else if(c >= 0xdc00 && c <= 0xdfff)
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;
All other symbols are not pairs so process them as is.
else
long unicode = c;
Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.
Finally byte array is converted to String by
String out = new String(outarray, Charset.forName("UTF-16BE"));
answered Sep 6 '17 at 2:36
NoAngelNoAngel
5462923
5462923
P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php
– NoAngel
Oct 3 '17 at 3:39
add a comment |
P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php
– NoAngel
Oct 3 '17 at 3:39
P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php
– NoAngel
Oct 3 '17 at 3:39
P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php
– NoAngel
Oct 3 '17 at 3:39
add a comment |
Here is what I use to remove emojis. Note: This only works on API 24 and forwards
public String remove_Emojis_For_Devices_API_24_Onwards(String name)
// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();
// this is where we will store the reasembled name
String newName = "";
//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it
// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);
return newName;
so if we pass in a string:
remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");
it returns: test Indic:ढ Japanese:な Korean:ㅂ
Emoji placement or count doesn't matter
Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.
– Jenix
Aug 31 '18 at 0:52
add a comment |
Here is what I use to remove emojis. Note: This only works on API 24 and forwards
public String remove_Emojis_For_Devices_API_24_Onwards(String name)
// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();
// this is where we will store the reasembled name
String newName = "";
//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it
// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);
return newName;
so if we pass in a string:
remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");
it returns: test Indic:ढ Japanese:な Korean:ㅂ
Emoji placement or count doesn't matter
Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.
– Jenix
Aug 31 '18 at 0:52
add a comment |
Here is what I use to remove emojis. Note: This only works on API 24 and forwards
public String remove_Emojis_For_Devices_API_24_Onwards(String name)
// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();
// this is where we will store the reasembled name
String newName = "";
//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it
// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);
return newName;
so if we pass in a string:
remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");
it returns: test Indic:ढ Japanese:な Korean:ㅂ
Emoji placement or count doesn't matter
Here is what I use to remove emojis. Note: This only works on API 24 and forwards
public String remove_Emojis_For_Devices_API_24_Onwards(String name)
// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();
// this is where we will store the reasembled name
String newName = "";
//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it
// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);
return newName;
so if we pass in a string:
remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");
it returns: test Indic:ढ Japanese:な Korean:ㅂ
Emoji placement or count doesn't matter
answered May 18 '17 at 20:05
Andrew MoreauAndrew Moreau
393
393
Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.
– Jenix
Aug 31 '18 at 0:52
add a comment |
Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.
– Jenix
Aug 31 '18 at 0:52
Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.
– Jenix
Aug 31 '18 at 0:52
Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.
– Jenix
Aug 31 '18 at 0:52
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22178349%2fandroid-how-to-filter-emoji-emoticons-from-a-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.
– Justin C
Mar 4 '14 at 17:10
1
See stackoverflow.com/questions/12013341/…
– Sujen
Mar 4 '14 at 17:10
You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…
– user2474486
Dec 14 '16 at 16:31
@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.
– Jochem Kuijpers
Dec 15 '16 at 2:34