Alex Egorov PRO

alexander-egorov
ยท

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

alexander-egorov's activity

reacted to nyuuzyou's post with ๐Ÿ”ฅ๐Ÿ‘ about 2 months ago
view post
Post
2792
I am planning to release *something big* this week, but in the meantime I was bored, so I quickly made a small dataset in as-is format.

๐Ÿ“ฑ Sponsr.ru Dataset - nyuuzyou/sponsr

Collection of 44,138 posts from Sponsr.ru, a Russian content subscription platform featuring:
- Comprehensive metadata including project details, post information, and pricing
- Detailed content categorization with images, videos, and text formats
- Monolingual Russian content from diverse creator projects
reacted to aifeifei798's post with ๐Ÿ‘๐Ÿ”ฅ 2 months ago
view post
Post
3923
๐Ÿ˜Š This program is designed to remove emojis from a given text. It uses a regular expression (regex) pattern to match and replace emojis with an empty string, effectively removing them from the text. The pattern includes a range of Unicode characters that correspond to various types of emojis, such as emoticons, symbols, and flags. By using this program, you can clean up text data by removing any emojis that may be present, which can be useful for text processing, analysis, or other applications where emojis are not desired. ๐Ÿ’ป
import re

def remove_emojis(text):
    # Define a broader emoji pattern
    emoji_pattern = re.compile(
        "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001F900-\U0001F9FF"  # supplemental symbols and pictographs
        u"\U0001FA00-\U0001FA6F"  # chess symbols and more emojis
        u"\U0001FA70-\U0001FAFF"  # more symbols and pictographs
        u"\U00002600-\U000026FF"  # miscellaneous symbols
        u"\U00002B50-\U00002B59"  # additional symbols
        u"\U0000200D"             # zero width joiner
        u"\U0000200C"             # zero width non-joiner
        u"\U0000FE0F"             # emoji variation selector
        "]+", flags=re.UNICODE
    )
    return emoji_pattern.sub(r'', text)