Android - How to filter emoji (emoticons) from a string?2019 Community Moderator Electionremoving characters of a specific unicode range from a stringCheck if letter is emojiCheck what user send emoji to telegram botAndroid - How to filter unstoreable characters from a string?How can I strip undisplayable characters from a string with PHP?How to judge a string contains emoji character in java,and how to filter itHow do save an Android Activity state using save instance state?Disable landscape mode in Android?How do I obtain crash-data from my Android application?Why is the Android emulator so slow? How can we speed up the Android emulator?Stop EditText from gaining focus at Activity startupIs quitting an application frowned upon?How do you install an APK file in the Android emulator?Get current time and date on AndroidRename package in Android StudioRemove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java string

Do native speakers use "ultima" and "proxima" frequently in spoken English?

Is there a hypothetical scenario that would make Earth uninhabitable for humans, but not for (the majority of) other animals?

Deletion of copy-ctor & copy-assignment - public, private or protected?

Can a medieval gyroplane be built?

I got the following comment from a reputed math journal. What does it mean?

If "dar" means "to give", what does "daros" mean?

Brake pads destroying wheels

How could an airship be repaired midflight?

Maths symbols and unicode-math input inside siunitx commands

Loading the leaflet Map in Lightning Web Component

Is it possible to stack the damage done by the Absorb Elements spell?

Does the attack bonus from a Masterwork weapon stack with the attack bonus from Masterwork ammunition?

Can a wizard cast a spell during their first turn of combat if they initiated combat by releasing a readied spell?

A Ri-diddley-iley Riddle

Can you move over difficult terrain with only 5 feet of movement?

Writing in a Christian voice

What are substitutions for coconut in curry?

What is the significance behind "40 days" that often appears in the Bible?

Do US professors/group leaders only get a salary, but no group budget?

How to terminate ping <dest> &

In what cases must I use 了 and in what cases not?

Would it be believable to defy demographics in a story?

Synchronized implementation of a bank account in Java

Knife as defense against stray dogs



Android - How to filter emoji (emoticons) from a string?



2019 Community Moderator Electionremoving characters of a specific unicode range from a stringCheck if letter is emojiCheck what user send emoji to telegram botAndroid - How to filter unstoreable characters from a string?How can I strip undisplayable characters from a string with PHP?How to judge a string contains emoji character in java,and how to filter itHow do save an Android Activity state using save instance state?Disable landscape mode in Android?How do I obtain crash-data from my Android application?Why is the Android emulator so slow? How can we speed up the Android emulator?Stop EditText from gaining focus at Activity startupIs quitting an application frowned upon?How do you install an APK file in the Android emulator?Get current time and date on AndroidRename package in Android StudioRemove ✅, 🔥, ✈ , ♛ and other such emojis/images/signs from Java string










5















I'm working on an Android app, and I do not want people to use emoji in the input.



How can I remove emoji characters from a string?










share|improve this question






















  • Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.

    – Justin C
    Mar 4 '14 at 17:10







  • 1





    See stackoverflow.com/questions/12013341/…

    – Sujen
    Mar 4 '14 at 17:10











  • You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…

    – user2474486
    Dec 14 '16 at 16:31











  • @user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.

    – Jochem Kuijpers
    Dec 15 '16 at 2:34















5















I'm working on an Android app, and I do not want people to use emoji in the input.



How can I remove emoji characters from a string?










share|improve this question






















  • Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.

    – Justin C
    Mar 4 '14 at 17:10







  • 1





    See stackoverflow.com/questions/12013341/…

    – Sujen
    Mar 4 '14 at 17:10











  • You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…

    – user2474486
    Dec 14 '16 at 16:31











  • @user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.

    – Jochem Kuijpers
    Dec 15 '16 at 2:34













5












5








5


5






I'm working on an Android app, and I do not want people to use emoji in the input.



How can I remove emoji characters from a string?










share|improve this question














I'm working on an Android app, and I do not want people to use emoji in the input.



How can I remove emoji characters from a string?







android emoji






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 4 '14 at 17:06









Jochem KuijpersJochem Kuijpers

98731228




98731228












  • Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.

    – Justin C
    Mar 4 '14 at 17:10







  • 1





    See stackoverflow.com/questions/12013341/…

    – Sujen
    Mar 4 '14 at 17:10











  • You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…

    – user2474486
    Dec 14 '16 at 16:31











  • @user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.

    – Jochem Kuijpers
    Dec 15 '16 at 2:34

















  • Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.

    – Justin C
    Mar 4 '14 at 17:10







  • 1





    See stackoverflow.com/questions/12013341/…

    – Sujen
    Mar 4 '14 at 17:10











  • You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…

    – user2474486
    Dec 14 '16 at 16:31











  • @user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.

    – Jochem Kuijpers
    Dec 15 '16 at 2:34
















Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.

– Justin C
Mar 4 '14 at 17:10






Regular expressions are an option. Or if the list of emojis is well known, a simple list that you can iterate through and remove matches in your input would work well.

– Justin C
Mar 4 '14 at 17:10





1




1





See stackoverflow.com/questions/12013341/…

– Sujen
Mar 4 '14 at 17:10





See stackoverflow.com/questions/12013341/…

– Sujen
Mar 4 '14 at 17:10













You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…

– user2474486
Dec 14 '16 at 16:31





You can use Character class stackoverflow.com/questions/28366172/check-if-letter-is-emoji/…

– user2474486
Dec 14 '16 at 16:31













@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.

– Jochem Kuijpers
Dec 15 '16 at 2:34





@user2474486 That's not what was being asked here. The Character class can indeed recognize surrogate pairs, but that does not mean the character is an emoji. E.g. U+1D120 is not an emoji but is a surrogate pair.

– Jochem Kuijpers
Dec 15 '16 at 2:34












3 Answers
3






active

oldest

votes


















18














Emojis can be found in the following ranges (source) :



  • U+2190 to U+21FF

  • U+2600 to U+26FF

  • U+2700 to U+27BF

  • U+3000 to U+303F

  • U+1F300 to U+1F64F

  • U+1F680 to U+1F6FF

You can use this line in your script to filter them all at once:



text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");






share|improve this answer

























  • this is one potential answer but does not handle all cases. But nonetheless

    – user210504
    Jun 7 '14 at 1:50






  • 13





    @user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.

    – Carl Anderson
    Sep 26 '14 at 18:02











  • Not working on Xperia Z 4.4

    – JiTHiN
    Sep 10 '15 at 20:44






  • 1





    u expects 4 digits -- how is this supposed to work for 1f300 etc?

    – Stefan Haustein
    Apr 24 '17 at 23:28






  • 2





    Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);

    – Aviv Mor
    Dec 3 '17 at 18:56


















4














Latest emoji data can be found here:



http://unicode.org/Public/emoji/



There is a folder named with emoji version.
As app developers a good idea is to use latest version available.



When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.



There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.



Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.



For example, we have a string.



String s = ...;


UTF-16 representation



byte[] utf16 = s.getBytes("UTF-16BE");


Iterate over UTF-16



for(int i = 0; i < utf16.length; i += 2) {


Get one char



char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));


Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.



if(c >= 0xd800 && c <= 0xd83f) 
high = c;
continue;



For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.



else if(c >= 0xdc00 && c <= 0xdfff) 
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;



All other symbols are not pairs so process them as is.



else 
long unicode = c;



Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.



Finally byte array is converted to String by



String out = new String(outarray, Charset.forName("UTF-16BE"));





share|improve this answer























  • P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php

    – NoAngel
    Oct 3 '17 at 3:39


















1














Here is what I use to remove emojis. Note: This only works on API 24 and forwards



public String remove_Emojis_For_Devices_API_24_Onwards(String name)

// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();

// this is where we will store the reasembled name
String newName = "";

//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it


// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);


return newName;




so if we pass in a string:



remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");



it returns: test Indic:ढ Japanese:な Korean:ㅂ



Emoji placement or count doesn't matter







share|improve this answer























  • Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.

    – Jenix
    Aug 31 '18 at 0:52










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22178349%2fandroid-how-to-filter-emoji-emoticons-from-a-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









18














Emojis can be found in the following ranges (source) :



  • U+2190 to U+21FF

  • U+2600 to U+26FF

  • U+2700 to U+27BF

  • U+3000 to U+303F

  • U+1F300 to U+1F64F

  • U+1F680 to U+1F6FF

You can use this line in your script to filter them all at once:



text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");






share|improve this answer

























  • this is one potential answer but does not handle all cases. But nonetheless

    – user210504
    Jun 7 '14 at 1:50






  • 13





    @user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.

    – Carl Anderson
    Sep 26 '14 at 18:02











  • Not working on Xperia Z 4.4

    – JiTHiN
    Sep 10 '15 at 20:44






  • 1





    u expects 4 digits -- how is this supposed to work for 1f300 etc?

    – Stefan Haustein
    Apr 24 '17 at 23:28






  • 2





    Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);

    – Aviv Mor
    Dec 3 '17 at 18:56















18














Emojis can be found in the following ranges (source) :



  • U+2190 to U+21FF

  • U+2600 to U+26FF

  • U+2700 to U+27BF

  • U+3000 to U+303F

  • U+1F300 to U+1F64F

  • U+1F680 to U+1F6FF

You can use this line in your script to filter them all at once:



text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");






share|improve this answer

























  • this is one potential answer but does not handle all cases. But nonetheless

    – user210504
    Jun 7 '14 at 1:50






  • 13





    @user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.

    – Carl Anderson
    Sep 26 '14 at 18:02











  • Not working on Xperia Z 4.4

    – JiTHiN
    Sep 10 '15 at 20:44






  • 1





    u expects 4 digits -- how is this supposed to work for 1f300 etc?

    – Stefan Haustein
    Apr 24 '17 at 23:28






  • 2





    Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);

    – Aviv Mor
    Dec 3 '17 at 18:56













18












18








18







Emojis can be found in the following ranges (source) :



  • U+2190 to U+21FF

  • U+2600 to U+26FF

  • U+2700 to U+27BF

  • U+3000 to U+303F

  • U+1F300 to U+1F64F

  • U+1F680 to U+1F6FF

You can use this line in your script to filter them all at once:



text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");






share|improve this answer















Emojis can be found in the following ranges (source) :



  • U+2190 to U+21FF

  • U+2600 to U+26FF

  • U+2700 to U+27BF

  • U+3000 to U+303F

  • U+1F300 to U+1F64F

  • U+1F680 to U+1F6FF

You can use this line in your script to filter them all at once:



text.replace("/[u2190-u21FF]|[u2600-u26FF]|[u2700-u27BF]|[u3000-u303F]|[u1F300-u1F64F]|[u1F680-u1F6FF]/g", "");







share|improve this answer














share|improve this answer



share|improve this answer








edited Sep 1 '15 at 11:42









JREN

2,84822142




2,84822142










answered Mar 16 '14 at 10:27









Faez MehrabaniFaez Mehrabani

26024




26024












  • this is one potential answer but does not handle all cases. But nonetheless

    – user210504
    Jun 7 '14 at 1:50






  • 13





    @user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.

    – Carl Anderson
    Sep 26 '14 at 18:02











  • Not working on Xperia Z 4.4

    – JiTHiN
    Sep 10 '15 at 20:44






  • 1





    u expects 4 digits -- how is this supposed to work for 1f300 etc?

    – Stefan Haustein
    Apr 24 '17 at 23:28






  • 2





    Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);

    – Aviv Mor
    Dec 3 '17 at 18:56

















  • this is one potential answer but does not handle all cases. But nonetheless

    – user210504
    Jun 7 '14 at 1:50






  • 13





    @user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.

    – Carl Anderson
    Sep 26 '14 at 18:02











  • Not working on Xperia Z 4.4

    – JiTHiN
    Sep 10 '15 at 20:44






  • 1





    u expects 4 digits -- how is this supposed to work for 1f300 etc?

    – Stefan Haustein
    Apr 24 '17 at 23:28






  • 2





    Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);

    – Aviv Mor
    Dec 3 '17 at 18:56
















this is one potential answer but does not handle all cases. But nonetheless

– user210504
Jun 7 '14 at 1:50





this is one potential answer but does not handle all cases. But nonetheless

– user210504
Jun 7 '14 at 1:50




13




13





@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.

– Carl Anderson
Sep 26 '14 at 18:02





@user210504 what cases does it not handle? It's not useful to say "this doesn't handle all cases" if you don't have an example.

– Carl Anderson
Sep 26 '14 at 18:02













Not working on Xperia Z 4.4

– JiTHiN
Sep 10 '15 at 20:44





Not working on Xperia Z 4.4

– JiTHiN
Sep 10 '15 at 20:44




1




1





u expects 4 digits -- how is this supposed to work for 1f300 etc?

– Stefan Haustein
Apr 24 '17 at 23:28





u expects 4 digits -- how is this supposed to work for 1f300 etc?

– Stefan Haustein
Apr 24 '17 at 23:28




2




2





Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);

– Aviv Mor
Dec 3 '17 at 18:56





Not working. In the end I used github.com/vdurmont/emoji-java. For example, removing all emojis: EmojiParser.removeAllEmojis(text);

– Aviv Mor
Dec 3 '17 at 18:56













4














Latest emoji data can be found here:



http://unicode.org/Public/emoji/



There is a folder named with emoji version.
As app developers a good idea is to use latest version available.



When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.



There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.



Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.



For example, we have a string.



String s = ...;


UTF-16 representation



byte[] utf16 = s.getBytes("UTF-16BE");


Iterate over UTF-16



for(int i = 0; i < utf16.length; i += 2) {


Get one char



char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));


Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.



if(c >= 0xd800 && c <= 0xd83f) 
high = c;
continue;



For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.



else if(c >= 0xdc00 && c <= 0xdfff) 
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;



All other symbols are not pairs so process them as is.



else 
long unicode = c;



Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.



Finally byte array is converted to String by



String out = new String(outarray, Charset.forName("UTF-16BE"));





share|improve this answer























  • P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php

    – NoAngel
    Oct 3 '17 at 3:39















4














Latest emoji data can be found here:



http://unicode.org/Public/emoji/



There is a folder named with emoji version.
As app developers a good idea is to use latest version available.



When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.



There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.



Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.



For example, we have a string.



String s = ...;


UTF-16 representation



byte[] utf16 = s.getBytes("UTF-16BE");


Iterate over UTF-16



for(int i = 0; i < utf16.length; i += 2) {


Get one char



char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));


Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.



if(c >= 0xd800 && c <= 0xd83f) 
high = c;
continue;



For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.



else if(c >= 0xdc00 && c <= 0xdfff) 
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;



All other symbols are not pairs so process them as is.



else 
long unicode = c;



Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.



Finally byte array is converted to String by



String out = new String(outarray, Charset.forName("UTF-16BE"));





share|improve this answer























  • P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php

    – NoAngel
    Oct 3 '17 at 3:39













4












4








4







Latest emoji data can be found here:



http://unicode.org/Public/emoji/



There is a folder named with emoji version.
As app developers a good idea is to use latest version available.



When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.



There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.



Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.



For example, we have a string.



String s = ...;


UTF-16 representation



byte[] utf16 = s.getBytes("UTF-16BE");


Iterate over UTF-16



for(int i = 0; i < utf16.length; i += 2) {


Get one char



char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));


Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.



if(c >= 0xd800 && c <= 0xd83f) 
high = c;
continue;



For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.



else if(c >= 0xdc00 && c <= 0xdfff) 
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;



All other symbols are not pairs so process them as is.



else 
long unicode = c;



Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.



Finally byte array is converted to String by



String out = new String(outarray, Charset.forName("UTF-16BE"));





share|improve this answer













Latest emoji data can be found here:



http://unicode.org/Public/emoji/



There is a folder named with emoji version.
As app developers a good idea is to use latest version available.



When You look inside a folder, You'll see text files in it.
You should check emoji-data.txt. It contains all standard emoji codes.



There are a lot of small symbol code ranges for emoji.
Best support will be to check all these in Your app.



Some people ask why there are 5 digit codes when we can only specify 4 after u.
Well these are codes made from surrogate pairs. Usually 2 symbols are used to encode one emoji.



For example, we have a string.



String s = ...;


UTF-16 representation



byte[] utf16 = s.getBytes("UTF-16BE");


Iterate over UTF-16



for(int i = 0; i < utf16.length; i += 2) {


Get one char



char c = (char)((char)(utf16[i] & 0xff) << 8 | (char)(utf16[i + 1] & 0xff));


Now check for surrogate pairs. Emoji are located on the first plane, so check first part of pair in range 0xd800..0xd83f.



if(c >= 0xd800 && c <= 0xd83f) 
high = c;
continue;



For second part of surrogate pair range is 0xdc00..0xdfff. And we can now convert a pair to one 5 digit code.



else if(c >= 0xdc00 && c <= 0xdfff) 
low = c;
long unicode = (((long)high - 0xd800) * 0x400) + ((long)low - 0xdc00) + 0x10000;



All other symbols are not pairs so process them as is.



else 
long unicode = c;



Now use data from emoji-data.txt to check if it's emoji.
If it is, then skip it. If not then copy bytes to output byte array.



Finally byte array is converted to String by



String out = new String(outarray, Charset.forName("UTF-16BE"));






share|improve this answer












share|improve this answer



share|improve this answer










answered Sep 6 '17 at 2:36









NoAngelNoAngel

5462923




5462923












  • P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php

    – NoAngel
    Oct 3 '17 at 3:39

















  • P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php

    – NoAngel
    Oct 3 '17 at 3:39
















P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php

– NoAngel
Oct 3 '17 at 3:39





P.S. If You want to remove some additional symbols, there are Unicode ranges can be found here: jrgraphix.net/research/unicode.php

– NoAngel
Oct 3 '17 at 3:39











1














Here is what I use to remove emojis. Note: This only works on API 24 and forwards



public String remove_Emojis_For_Devices_API_24_Onwards(String name)

// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();

// this is where we will store the reasembled name
String newName = "";

//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it


// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);


return newName;




so if we pass in a string:



remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");



it returns: test Indic:ढ Japanese:な Korean:ㅂ



Emoji placement or count doesn't matter







share|improve this answer























  • Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.

    – Jenix
    Aug 31 '18 at 0:52















1














Here is what I use to remove emojis. Note: This only works on API 24 and forwards



public String remove_Emojis_For_Devices_API_24_Onwards(String name)

// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();

// this is where we will store the reasembled name
String newName = "";

//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it


// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);


return newName;




so if we pass in a string:



remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");



it returns: test Indic:ढ Japanese:な Korean:ㅂ



Emoji placement or count doesn't matter







share|improve this answer























  • Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.

    – Jenix
    Aug 31 '18 at 0:52













1












1








1







Here is what I use to remove emojis. Note: This only works on API 24 and forwards



public String remove_Emojis_For_Devices_API_24_Onwards(String name)

// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();

// this is where we will store the reasembled name
String newName = "";

//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it


// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);


return newName;




so if we pass in a string:



remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");



it returns: test Indic:ढ Japanese:な Korean:ㅂ



Emoji placement or count doesn't matter







share|improve this answer













Here is what I use to remove emojis. Note: This only works on API 24 and forwards



public String remove_Emojis_For_Devices_API_24_Onwards(String name)

// we will store all the non emoji characters in this array list
ArrayList<Character> nonEmoji = new ArrayList<>();

// this is where we will store the reasembled name
String newName = "";

//Character.UnicodeScript.of () was not added till API 24 so this is a 24 up solution
if (Build.VERSION.SDK_INT > 23)
/* we are going to cycle through the word checking each character
to find its unicode script to compare it against known alphabets*/
for (int i = 0; i < name.length(); i++)
// currently emojis don't have a devoted unicode script so they return UNKNOWN
if (!(Character.UnicodeScript.of(name.charAt(i)) + "").equals("UNKNOWN"))
nonEmoji.add(name.charAt(i));//its not an emoji so we add it


// we then cycle through rebuilding the string
for (int i = 0; i < nonEmoji.size(); i++)
newName += nonEmoji.get(i);


return newName;




so if we pass in a string:



remove_Emojis_For_Devices_API_24_Onwards("😊 test 😊 Indic:ढ Japanese:な 😊 Korean:ㅂ");



it returns: test Indic:ढ Japanese:な Korean:ㅂ



Emoji placement or count doesn't matter








share|improve this answer












share|improve this answer



share|improve this answer










answered May 18 '17 at 20:05









Andrew MoreauAndrew Moreau

393




393












  • Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.

    – Jenix
    Aug 31 '18 at 0:52

















  • Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.

    – Jenix
    Aug 31 '18 at 0:52
















Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.

– Jenix
Aug 31 '18 at 0:52





Really interesting but not perfect. This couldn't filter "❤" and "☤", which resides in dingbats and miscellaneous symbols block.

– Jenix
Aug 31 '18 at 0:52

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22178349%2fandroid-how-to-filter-emoji-emoticons-from-a-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Can't initialize raids on a new ASUS Prime B360M-A motherboard2019 Community Moderator ElectionSimilar to RAID config yet more like mirroring solution?Can't get motherboard serial numberWhy does the BIOS entry point start with a WBINVD instruction?UEFI performance Asus Maximus V Extreme

Identity Server 4 is not redirecting to Angular app after login2019 Community Moderator ElectionIdentity Server 4 and dockerIdentityserver implicit flow unauthorized_clientIdentityServer Hybrid Flow - Access Token is null after user successful loginIdentity Server to MVC client : Page Redirect After loginLogin with Steam OpenId(oidc-client-js)Identity Server 4+.NET Core 2.0 + IdentityIdentityServer4 post-login redirect not working in Edge browserCall to IdentityServer4 generates System.NullReferenceException: Object reference not set to an instance of an objectIdentityServer4 without HTTPS not workingHow to get Authorization code from identity server without login form

2005 Ahvaz unrest Contents Background Causes Casualties Aftermath See also References Navigation menue"At Least 10 Are Killed by Bombs in Iran""Iran"Archived"Arab-Iranians in Iran to make April 15 'Day of Fury'"State of Mind, State of Order: Reactions to Ethnic Unrest in the Islamic Republic of Iran.10.1111/j.1754-9469.2008.00028.x"Iran hangs Arab separatists"Iran Overview from ArchivedConstitution of the Islamic Republic of Iran"Tehran puzzled by forged 'riots' letter""Iran and its minorities: Down in the second class""Iran: Handling Of Ahvaz Unrest Could End With Televised Confessions""Bombings Rock Iran Ahead of Election""Five die in Iran ethnic clashes""Iran: Need for restraint as anniversary of unrest in Khuzestan approaches"Archived"Iranian Sunni protesters killed in clashes with security forces"Archived