DeepCritical / AUDIO_INPUT_FIX.md
Joseph Pollack
final countdown
e427816
|
raw
history blame
3.01 kB
# Audio Input Display Fix
## Issue
The audio input (microphone button) was not displaying in the ChatInterface multimodal textbox.
## Root Cause
When `multimodal=True` is set on `gr.ChatInterface`, it should automatically show image and audio buttons. However:
1. The buttons might be hidden in a dropdown menu
2. Browser permissions might be blocking microphone access
3. The `file_types` parameter might not have been explicitly set
## Fix Applied
### 1. Added `file_types` Parameter
Explicitly specified which file types are accepted to ensure audio is enabled:
```python
gr.ChatInterface(
fn=research_agent,
multimodal=True,
file_types=["image", "audio", "video"], # Explicitly enable image, audio, and video
...
)
```
**File:** `src/app.py` (line 929)
### 2. Enhanced UI Description
Updated the description to make it clearer where to find the audio input:
- Added explicit instructions about clicking the πŸ“· and 🎀 icons
- Added a tip about looking for icons in the text input box
- Clarified drag & drop functionality
**File:** `src/app.py` (lines 942-948)
## How It Works Now
1. **Audio Recording Button**: The 🎀 microphone icon should appear in the textbox toolbar when `multimodal=True` is set
2. **File Upload**: Users can drag & drop audio files or click to upload
3. **Browser Permissions**: Browser will prompt for microphone access when user clicks the audio button
## Testing
To verify the fix:
1. Look for the 🎀 microphone icon in the text input box
2. Click it to start recording (browser will ask for microphone permission)
3. Alternatively, drag & drop an audio file into the textbox
4. Check browser console for any permission errors
## Browser Requirements
- **Chrome/Edge**: Should work with microphone permissions
- **Firefox**: Should work with microphone permissions
- **Safari**: May require additional configuration
- **HTTPS Required**: Microphone access typically requires HTTPS (or localhost)
## Troubleshooting
If audio input still doesn't appear:
1. **Check Browser Permissions**:
- Open browser settings
- Check microphone permissions for the site
- Ensure microphone is not blocked
2. **Check Browser Console**:
- Open Developer Tools (F12)
- Look for permission errors or warnings
- Check for any JavaScript errors
3. **Try Different Browser**:
- Some browsers have stricter permission policies
- Try Chrome or Firefox if Safari doesn't work
4. **Check Gradio Version**:
- Ensure `gradio>=6.0.0` is installed
- Update if needed: `pip install --upgrade gradio`
5. **HTTPS Requirement**:
- Microphone access requires HTTPS (or localhost)
- If deploying, ensure SSL is configured
## Additional Notes
- The audio button is part of the MultimodalTextbox component
- It should appear as an icon in the textbox toolbar
- If it's still not visible, it might be in a dropdown menu (click the "+" or "..." button)
- The `file_types` parameter ensures audio files are accepted for upload