Datasourceforcryptocurrency / docs /components /CRYPTO_DATA_BANK_README.md
Really-amin's picture
Upload 301 files
e4e4574 verified
# 🏦 Crypto Data Bank - بانک اطلاعاتی قدرتمند رمزارز
## 📋 Overview | نمای کلی
**Crypto Data Bank** is a powerful cryptocurrency data aggregation system running on HuggingFace Spaces that acts as an intelligent gateway between data consumers and 200+ free data sources.
**بانک اطلاعاتی رمزارز** یک سیستم قدرتمند جمع‌آوری داده که روی HuggingFace Spaces اجرا می‌شود و به عنوان دروازه‌ای هوشمند بین مصرف‌کنندگان داده و بیش از 200 منبع رایگان عمل می‌کند.
### 🎯 Key Features | ویژگی‌های کلیدی
**100% FREE Data Sources** - No API keys required for basic functionality
**Real-time Price Data** - From 5+ free providers (CoinCap, CoinGecko, Binance, Kraken, CryptoCompare)
**News Aggregation** - 8+ RSS feeds (CoinTelegraph, CoinDesk, Bitcoin Magazine, etc.)
**Market Sentiment** - Fear & Greed Index, BTC Dominance, Global Stats
**HuggingFace AI Models** - Sentiment analysis with FinBERT, categorization with BART
**Intelligent Caching** - Database-backed caching for fast responses
**Background Collection** - Continuous data gathering in the background
**REST API Gateway** - FastAPI-based API with automatic documentation
---
## 🏗️ Architecture | معماری
```
┌─────────────────────────────────────────────────────────────────┐
│ API Gateway (FastAPI) │
│ http://localhost:8888 │
│ │
│ Endpoints: │
│ • /api/prices - Real-time cryptocurrency prices │
│ • /api/news - Aggregated crypto news │
│ • /api/sentiment - Market sentiment analysis │
│ • /api/market/overview - Complete market overview │
│ • /api/trending - Trending coins from news │
│ • /api/ai/analysis - AI-powered analysis │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Orchestrator Layer │
│ (Background Data Collection) │
│ │
│ • Prices: Collected every 60 seconds │
│ • News: Collected every 5 minutes │
│ • Sentiment: Collected every 3 minutes │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Collector Layer │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ Price Collector │ │ News Collector │ │ Sentiment │ │
│ │ (5 sources) │ │ (8 sources) │ │ Collector │ │
│ └─────────────────┘ └─────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ AI Analysis Layer │
│ (HuggingFace Models) │
│ │
│ • FinBERT - Financial sentiment analysis │
│ • BART-MNLI - News categorization │
│ • Aggregated sentiment calculation │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Database Layer (SQLite) │
│ │
│ Tables: │
│ • prices - Historical price data │
│ • ohlcv - Candlestick data │
│ • news - News articles with AI analysis │
│ • market_sentiment - Sentiment indicators │
│ • ai_analysis - AI model outputs │
│ • api_cache - Response caching │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Free Data Sources │
│ │
│ Price Sources (NO API KEY): │
│ • CoinCap.io • CoinGecko (free tier) │
│ • Binance Public API • Kraken Public API │
│ • CryptoCompare • Alternative.me (F&G) │
│ │
│ News Sources (RSS Feeds): │
│ • CoinTelegraph • CoinDesk │
│ • Bitcoin Magazine • Decrypt │
│ • The Block • CryptoPotato │
│ • NewsBTC • Bitcoinist │
└─────────────────────────────────────────────────────────────────┘
```
---
## 📂 Project Structure | ساختار پروژه
```
crypto_data_bank/
├── __init__.py # Package initialization
├── database.py # SQLite database layer
├── orchestrator.py # Data collection orchestrator
├── api_gateway.py # Main FastAPI gateway
├── requirements.txt # Python dependencies
├── collectors/ # Data collectors
│ ├── __init__.py
│ ├── free_price_collector.py # FREE price collection (5 sources)
│ ├── rss_news_collector.py # RSS news aggregation (8 feeds)
│ └── sentiment_collector.py # Market sentiment collection
└── ai/ # AI/ML components
├── __init__.py
└── huggingface_models.py # HuggingFace model integration
```
---
## 🚀 Quick Start | راه‌اندازی سریع
### 1. Install Dependencies | نصب وابستگی‌ها
```bash
cd crypto_data_bank
pip install -r requirements.txt
```
### 2. Start the API Gateway | راه‌اندازی API Gateway
```bash
python api_gateway.py
```
The server will start on `http://localhost:8888`
### 3. Access the API | دسترسی به API
**Interactive Documentation:**
- Swagger UI: http://localhost:8888/docs
- ReDoc: http://localhost:8888/redoc
**Example API Calls:**
```bash
# Get latest prices
curl http://localhost:8888/api/prices?symbols=BTC,ETH,SOL
# Get crypto news
curl http://localhost:8888/api/news?limit=10
# Get market sentiment
curl http://localhost:8888/api/sentiment
# Get market overview
curl http://localhost:8888/api/market/overview
# Get trending coins
curl http://localhost:8888/api/trending
```
---
## 📊 API Endpoints | نقاط پایانی API
### Core Endpoints
#### `GET /`
Root endpoint with API information
#### `GET /api/health`
Health check and system status
#### `GET /api/stats`
Complete database and collection statistics
### Price Endpoints
#### `GET /api/prices`
Get cryptocurrency prices
**Parameters:**
- `symbols` (optional): Comma-separated symbols (e.g., BTC,ETH,SOL)
- `limit` (default: 100): Number of results
- `force_refresh` (default: false): Force fresh data collection
**Example:**
```bash
GET /api/prices?symbols=BTC,ETH&limit=10&force_refresh=true
```
**Response:**
```json
{
"success": true,
"source": "live_collection",
"count": 2,
"data": [
{
"symbol": "BTC",
"price": 50000.00,
"change24h": 2.5,
"volume24h": 25000000000,
"marketCap": 980000000000,
"sources_count": 5,
"sources": ["coincap", "coingecko", "binance", "kraken", "cryptocompare"]
}
],
"timestamp": "2024-11-14T10:30:00"
}
```
#### `GET /api/prices/{symbol}`
Get single crypto with price history
**Parameters:**
- `history_hours` (default: 24): Hours of price history
### News Endpoints
#### `GET /api/news`
Get cryptocurrency news
**Parameters:**
- `limit` (default: 50): Number of news items
- `category` (optional): Filter by category
- `coin` (optional): Filter by coin symbol
- `force_refresh` (default: false): Force fresh collection
**Example:**
```bash
GET /api/news?coin=BTC&limit=20
```
#### `GET /api/trending`
Get trending coins based on news mentions
### Sentiment Endpoints
#### `GET /api/sentiment`
Get market sentiment analysis
**Response:**
```json
{
"success": true,
"data": {
"fear_greed": {
"fear_greed_value": 65,
"fear_greed_classification": "Greed"
},
"btc_dominance": {
"btc_dominance": 48.5
},
"overall_sentiment": {
"overall_sentiment": "Greed",
"sentiment_score": 62.5,
"confidence": 0.85
}
}
}
```
#### `GET /api/market/overview`
Complete market overview with prices, sentiment, and news
### AI Analysis Endpoints
#### `GET /api/ai/analysis`
Get AI analyses from database
**Parameters:**
- `symbol` (optional): Filter by symbol
- `limit` (default: 50): Number of results
#### `POST /api/ai/analyze/news`
Analyze news sentiment with AI
**Parameters:**
- `text`: News text to analyze
**Response:**
```json
{
"success": true,
"analysis": {
"sentiment": "bullish",
"confidence": 0.92,
"model": "finbert"
}
}
```
### Collection Control Endpoints
#### `POST /api/collection/start`
Start background data collection
#### `POST /api/collection/stop`
Stop background data collection
#### `GET /api/collection/status`
Get collection status
---
## 🤖 HuggingFace AI Models | مدل‌های هوش مصنوعی
### FinBERT - Sentiment Analysis
- **Model:** `ProsusAI/finbert`
- **Purpose:** Financial sentiment analysis of news
- **Output:** bullish / bearish / neutral
- **Use Case:** Analyze crypto news sentiment
### BART-MNLI - Zero-Shot Classification
- **Model:** `facebook/bart-large-mnli`
- **Purpose:** News categorization
- **Categories:** price_movement, regulation, technology, adoption, security, defi, nft, etc.
- **Use Case:** Automatically categorize news articles
### Simple Analyzer (Fallback)
- **Method:** Keyword-based sentiment
- **Use Case:** When transformers not available
- **Performance:** Fast but less accurate
---
## 💾 Database Schema | ساختار دیتابیس
### `prices` Table
Stores real-time cryptocurrency prices
**Columns:**
- `id`: Primary key
- `symbol`: Crypto symbol (BTC, ETH, etc.)
- `price`: Current price in USD
- `change_1h`, `change_24h`, `change_7d`: Price changes
- `volume_24h`: 24-hour trading volume
- `market_cap`: Market capitalization
- `rank`: Market cap rank
- `source`: Data source
- `timestamp`: Collection time
### `news` Table
Stores crypto news articles
**Columns:**
- `id`: Primary key
- `title`: News title
- `description`: News description
- `url`: Article URL (unique)
- `source`: News source
- `published_at`: Publication date
- `sentiment`: AI sentiment score
- `coins`: Related cryptocurrencies (JSON)
- `category`: News category
### `market_sentiment` Table
Stores market sentiment indicators
**Columns:**
- `fear_greed_value`: Fear & Greed Index value (0-100)
- `fear_greed_classification`: Classification (Fear/Greed/etc.)
- `overall_sentiment`: Calculated overall sentiment
- `sentiment_score`: Aggregated sentiment score
- `confidence`: Confidence level
### `ai_analysis` Table
Stores AI model analysis results
**Columns:**
- `symbol`: Cryptocurrency symbol
- `analysis_type`: Type of analysis
- `model_used`: AI model name
- `input_data`: Input data (JSON)
- `output_data`: Analysis output (JSON)
- `confidence`: Confidence score
### `api_cache` Table
Caches API responses for performance
**Columns:**
- `endpoint`: API endpoint
- `params`: Request parameters
- `response`: Cached response (JSON)
- `ttl`: Time to live (seconds)
- `expires_at`: Expiration timestamp
---
## 🔄 Data Collection Flow | جریان جمع‌آوری داده
### Background Collection (Auto-started)
1. **Price Collection** (Every 60 seconds)
- Fetch from 5 free sources simultaneously
- Aggregate using median price
- Save to database
- Cache for fast API responses
2. **News Collection** (Every 5 minutes)
- Fetch from 8 RSS feeds
- Deduplicate articles
- Analyze sentiment with AI
- Extract mentioned coins
- Save to database
3. **Sentiment Collection** (Every 3 minutes)
- Fetch Fear & Greed Index
- Calculate BTC dominance
- Get global market stats
- Aggregate overall sentiment
- Save to database
### API Request Flow
```
User Request
API Gateway
Check Database Cache
Cache Hit? → Return Cached Data (Fast!)
Cache Miss or force_refresh=true
Collect Fresh Data
Save to Database
Return Fresh Data
```
---
## 📈 Performance | کارایی
### Response Times
- **Cached Responses:** < 50ms
- **Fresh Price Collection:** 2-5 seconds
- **Fresh News Collection:** 5-15 seconds
- **AI Analysis:** 1-3 seconds per news item
### Caching Strategy
- **Default TTL:** 60 seconds for prices, 300 seconds for news
- **Database-backed:** Persistent across restarts
- **Intelligent Fallback:** Serves cached data if live collection fails
### Resource Usage
- **Memory:** ~200-500 MB (with AI models loaded)
- **CPU:** Low (mostly I/O bound)
- **Disk:** Grows ~1-5 MB per day (depending on collection frequency)
- **Network:** Minimal (all sources are free APIs)
---
## 🌐 Data Sources | منابع داده
### Price Sources (5 sources, NO API KEY)
| Source | URL | Free Tier | Rate Limit | Notes |
|--------|-----|-----------|------------|-------|
| CoinCap | coincap.io | ✅ Unlimited | None | Best for market cap data |
| CoinGecko | coingecko.com | ✅ Yes | 10-30/min | Most comprehensive |
| Binance Public | binance.com | ✅ Yes | 1200/min | Real-time prices |
| Kraken Public | kraken.com | ✅ Yes | 1/sec | Reliable exchange data |
| CryptoCompare | cryptocompare.com | ✅ Yes | 100K/month | Good fallback |
### News Sources (8 sources, RSS feeds)
| Source | URL | Update Frequency | Quality |
|--------|-----|-----------------|---------|
| CoinTelegraph | cointelegraph.com | Every 30 min | ⭐⭐⭐⭐⭐ |
| CoinDesk | coindesk.com | Every hour | ⭐⭐⭐⭐⭐ |
| Bitcoin Magazine | bitcoinmagazine.com | Daily | ⭐⭐⭐⭐ |
| Decrypt | decrypt.co | Every hour | ⭐⭐⭐⭐ |
| The Block | theblock.co | Every hour | ⭐⭐⭐⭐⭐ |
| CryptoPotato | cryptopotato.com | Every 30 min | ⭐⭐⭐ |
| NewsBTC | newsbtc.com | Every hour | ⭐⭐⭐ |
| Bitcoinist | bitcoinist.com | Every hour | ⭐⭐⭐ |
### Sentiment Sources (3 sources, FREE)
| Source | Metric | Update | Quality |
|--------|--------|--------|---------|
| Alternative.me | Fear & Greed Index | Daily | ⭐⭐⭐⭐⭐ |
| CoinCap | BTC Dominance | Real-time | ⭐⭐⭐⭐ |
| CoinGecko | Global Market Stats | Every 10 min | ⭐⭐⭐⭐⭐ |
---
## 🚀 Deployment to HuggingFace Spaces | استقرار در HuggingFace
### Prerequisites
1. HuggingFace account
2. Git installed
3. HuggingFace CLI (optional)
### Steps
1. **Create New Space**
- Go to https://huggingface.co/new-space
- Choose "Docker" as Space SDK
- Select appropriate hardware (CPU is sufficient)
2. **Clone Repository**
```bash
git clone https://huggingface.co/spaces/YOUR_USERNAME/crypto-data-bank
cd crypto-data-bank
```
3. **Copy Files**
```bash
cp -r crypto_data_bank/* .
```
4. **Create Dockerfile**
(See deployment section below)
5. **Push to HuggingFace**
```bash
git add .
git commit -m "Initial deployment"
git push
```
6. **Configure Space**
- Set port to 8888 in Space settings
- Enable persistence for database storage
- Wait for build to complete
7. **Access Your Space**
- URL: https://YOUR_USERNAME-crypto-data-bank.hf.space
- API Docs: https://YOUR_USERNAME-crypto-data-bank.hf.space/docs
---
## 🐳 Docker Deployment | استقرار داکر
**Dockerfile:**
```dockerfile
FROM python:3.10-slim
WORKDIR /app
# Install dependencies
COPY crypto_data_bank/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY crypto_data_bank/ /app/
# Create data directory
RUN mkdir -p /app/data
# Expose port
EXPOSE 8888
# Run application
CMD ["python", "api_gateway.py"]
```
**Build and Run:**
```bash
# Build image
docker build -t crypto-data-bank .
# Run container
docker run -p 8888:8888 -v $(pwd)/data:/app/data crypto-data-bank
```
---
## 🧪 Testing | تست
### Test Individual Collectors
```bash
# Test price collector
python crypto_data_bank/collectors/free_price_collector.py
# Test news collector
python crypto_data_bank/collectors/rss_news_collector.py
# Test sentiment collector
python crypto_data_bank/collectors/sentiment_collector.py
# Test AI models
python crypto_data_bank/ai/huggingface_models.py
# Test orchestrator
python crypto_data_bank/orchestrator.py
```
### Test API Gateway
```bash
# Start server
python crypto_data_bank/api_gateway.py
# In another terminal, test endpoints
curl http://localhost:8888/api/health
curl http://localhost:8888/api/prices?symbols=BTC
curl http://localhost:8888/api/news?limit=5
```
---
## 📝 Configuration | پیکربندی
### Collection Intervals
Edit in `orchestrator.py`:
```python
self.intervals = {
'prices': 60, # Every 1 minute
'news': 300, # Every 5 minutes
'sentiment': 180, # Every 3 minutes
}
```
### Database Location
Edit in `database.py`:
```python
def __init__(self, db_path: str = "data/crypto_bank.db"):
```
### API Port
Edit in `api_gateway.py`:
```python
uvicorn.run(
"api_gateway:app",
host="0.0.0.0",
port=8888, # Change port here
)
```
---
## 🔒 Security Considerations | ملاحظات امنیتی
**No API Keys Stored** - All data sources are free and public
**Read-Only Operations** - Only fetches data, never modifies external sources
**Rate Limiting** - Respects source rate limits
**Input Validation** - Pydantic models validate all inputs
**SQL Injection Protection** - Uses parameterized queries
**CORS Enabled** - Configure as needed for your use case
---
## 🎓 Use Cases | موارد استفاده
### 1. Trading Bots
Use the API to get real-time prices and sentiment for automated trading
### 2. Portfolio Trackers
Build a portfolio tracker with historical price data
### 3. News Aggregators
Create a crypto news dashboard with AI sentiment analysis
### 4. Market Analysis
Analyze market trends using sentiment and price data
### 5. Research & Education
Study cryptocurrency market behavior and sentiment correlation
---
## 🤝 Contributing | مشارکت
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request
---
## 📄 License | مجوز
Same as main project
---
## 🙏 Acknowledgments | تشکر
**Data Sources:**
- CoinCap, CoinGecko, Binance, Kraken, CryptoCompare
- Alternative.me (Fear & Greed Index)
- CoinTelegraph, CoinDesk, and other news sources
**Technologies:**
- FastAPI - Web framework
- HuggingFace Transformers - AI models
- SQLite - Database
- httpx - HTTP client
- feedparser - RSS parsing
- BeautifulSoup - HTML parsing
**AI Models:**
- ProsusAI/finbert - Financial sentiment
- facebook/bart-large-mnli - Classification
---
## 📞 Support | پشتیبانی
**Documentation:** See `/docs` endpoint when running
**Issues:** Report at GitHub repository
**Contact:** Check main project README
---
## 🎉 Status | وضعیت
**Version:** 1.0.0
**Status:** ✅ Production Ready
**Last Updated:** 2024-11-14
**Deployment:** Ready for HuggingFace Spaces
---
**Built with ❤️ for the crypto community**
**با ❤️ برای جامعه کریپتو ساخته شده**