Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
Feelings to Emoji: Technical Reference
This document provides technical details about the implementation of the Feelings to Emoji application.
Project Structure
The application is organized into several Python modules:
app.py- Main application file with Gradio interfaceemoji_processor.py- Core processing logic for emoji matchingconfig.py- Configuration settingsutils.py- Utility functionsgenerate_embeddings.py- Standalone tool to pre-generate embeddings
Embedding Models
The system uses the following sentence embedding models from the Sentence Transformers library:
| Model Key | Model ID | Size | Description |
|---|---|---|---|
| mpnet | all-mpnet-base-v2 | 110M | Balanced, great general-purpose model |
| gte | thenlper/gte-large | 335M | Context-rich, good for emotion & nuance |
| bge | BAAI/bge-large-en-v1.5 | 350M | Tuned for ranking & high-precision similarity |
Emoji Matching Algorithm
The application uses cosine similarity between sentence embeddings to match text with emojis:
For each emoji category (emotion and event):
- Embed descriptions using the selected model
- Calculate cosine similarity between the input text embedding and each emoji description embedding
- Return the emoji with the highest similarity score
The embeddings are pre-computed and cached to improve performance:
- Stored as pickle files in the
embeddings/directory - Generated using
generate_embeddings.py - Loaded at startup to minimize processing time
- Stored as pickle files in the
Module Reference
config.py
Contains configuration settings including:
CONFIG: Dictionary with basic application settings (model name, file paths, etc.)EMBEDDING_MODELS: Dictionary defining the available embedding models
utils.py
Utility functions including:
setup_logging(): Configures application loggingkitchen_txt_to_dict(filepath): Parses emoji dictionary filessave_embeddings_to_pickle(embeddings, filepath): Saves embeddings to pickle filesload_embeddings_from_pickle(filepath): Loads embeddings from pickle filesget_embeddings_pickle_path(model_id, emoji_type): Generates consistent paths for embedding files
emoji_processor.py
Core processing logic:
EmojiProcessor: Main class for emoji matching and processing__init__(model_name=None, model_key=None, use_cached_embeddings=True): Initializes the processor with a specific modelload_emoji_dictionaries(emotion_file, item_file): Loads emoji dictionaries from text filesswitch_model(model_key): Switches to a different embedding modelsentence_to_emojis(sentence): Processes text to find matching emojis and generate mashupfind_top_emojis(embedding, emoji_embeddings, top_n=1): Finds top matching emojis using cosine similarity
app.py
Gradio interface:
EmojiMashupApp: Main application classcreate_interface(): Creates the Gradio interfaceprocess_with_model(model_selection, text, use_cached_embeddings): Processes text with selected modelget_random_example(): Gets a random example sentence for demonstration
generate_embeddings.py
Standalone utility to pre-generate embeddings:
generate_embeddings_for_model(model_key, model_info): Generates embeddings for a specific modelmain(): Main function that processes all models and saves embeddings
Emoji Data Files
google-emoji-kitchen-emotion.txt: Emotion emojis with descriptionsgoogle-emoji-kitchen-item.txt: Event/object emojis with descriptionsgoogle-emoji-kitchen-compatible.txt: Compatibility information for emoji combinations
Embedding Cache Structure
The embeddings/ directory contains pre-generated embeddings in pickle format:
[model_id]_emotion.pkl: Embeddings for emotion emojis[model_id]_event.pkl: Embeddings for event/object emojis
API Usage Examples
Using the EmojiProcessor Directly
from emoji_processor import EmojiProcessor
# Initialize with default model (mpnet)
processor = EmojiProcessor()
processor.load_emoji_dictionaries()
# Process a sentence
emotion, event, image = processor.sentence_to_emojis("I'm feeling happy today!")
print(f"Emotion emoji: {emotion}")
print(f"Event emoji: {event}")
# image contains the PIL Image object of the mashup
Switching Models
# Switch to a different model
processor.switch_model("gte")
# Process with the new model
emotion, event, image = processor.sentence_to_emojis("I'm feeling anxious about tomorrow.")
Performance Considerations
- Embedding generation is computationally intensive but only happens once per model
- Using cached embeddings significantly improves response time
- Larger models (GTE, BGE) may provide better accuracy but require more resources
- The MPNet model offers a good balance of performance and accuracy for most use cases