File size: 3,006 Bytes
7b7ce7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f73c49f
2547ba7
e427816
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Audio Input Display Fix

## Issue
The audio input (microphone button) was not displaying in the ChatInterface multimodal textbox.

## Root Cause
When `multimodal=True` is set on `gr.ChatInterface`, it should automatically show image and audio buttons. However:
1. The buttons might be hidden in a dropdown menu
2. Browser permissions might be blocking microphone access
3. The `file_types` parameter might not have been explicitly set

## Fix Applied

### 1. Added `file_types` Parameter
Explicitly specified which file types are accepted to ensure audio is enabled:

```python
gr.ChatInterface(
    fn=research_agent,
    multimodal=True,
    file_types=["image", "audio", "video"],  # Explicitly enable image, audio, and video
    ...
)
```

**File:** `src/app.py` (line 929)

### 2. Enhanced UI Description
Updated the description to make it clearer where to find the audio input:

- Added explicit instructions about clicking the πŸ“· and 🎀 icons
- Added a tip about looking for icons in the text input box
- Clarified drag & drop functionality

**File:** `src/app.py` (lines 942-948)

## How It Works Now

1. **Audio Recording Button**: The 🎀 microphone icon should appear in the textbox toolbar when `multimodal=True` is set
2. **File Upload**: Users can drag & drop audio files or click to upload
3. **Browser Permissions**: Browser will prompt for microphone access when user clicks the audio button

## Testing

To verify the fix:
1. Look for the 🎀 microphone icon in the text input box
2. Click it to start recording (browser will ask for microphone permission)
3. Alternatively, drag & drop an audio file into the textbox
4. Check browser console for any permission errors

## Browser Requirements

- **Chrome/Edge**: Should work with microphone permissions
- **Firefox**: Should work with microphone permissions  
- **Safari**: May require additional configuration
- **HTTPS Required**: Microphone access typically requires HTTPS (or localhost)

## Troubleshooting

If audio input still doesn't appear:

1. **Check Browser Permissions**:
   - Open browser settings
   - Check microphone permissions for the site
   - Ensure microphone is not blocked

2. **Check Browser Console**:
   - Open Developer Tools (F12)
   - Look for permission errors or warnings
   - Check for any JavaScript errors

3. **Try Different Browser**:
   - Some browsers have stricter permission policies
   - Try Chrome or Firefox if Safari doesn't work

4. **Check Gradio Version**:
   - Ensure `gradio>=6.0.0` is installed
   - Update if needed: `pip install --upgrade gradio`

5. **HTTPS Requirement**:
   - Microphone access requires HTTPS (or localhost)
   - If deploying, ensure SSL is configured

## Additional Notes

- The audio button is part of the MultimodalTextbox component
- It should appear as an icon in the textbox toolbar
- If it's still not visible, it might be in a dropdown menu (click the "+" or "..." button)
- The `file_types` parameter ensures audio files are accepted for upload