- LLM for sales, marketing, promotion
- LLM for Website Revision System
- increasing quality of communication with customers
- helping clients access information faster
- saving people from financial troubles
The Qwen3.5 Multimodal Understanding Demo, powered by Qwen3.5-2B, is now available on HF Spaces! It is a lightweight model designed for fast image and video reasoning. Built with Gradio, the demo showcases Image QA, Video QA, object detection, and 2D point tracking, along with real-time token streaming.
The AI benchmark ecosystem has three structural problems. Major benchmarks like MMLU have surpassed 90%, losing discriminative power. Most leaderboards publish unverified self-reported scores β our cross-verification found Claude Opus 4.6's ARC-AGI-2 listed as 37.6% (actual: 68.8%), Gemini 3.1 Pro as 88.1% (actual: 77.1%). OpenAI's own audit confirmed 59.4% of SWE-bench Verified tasks are defective, yet it remains widely used.
ALL Bench addresses this by comparing 91 models across 6 modalities (LLM Β· VLM Β· Agent Β· Image Β· Video Β· Music) with 3-tier confidence badges (ββ cross-verified Β· β single-source Β· ~ self-reported). Composite scoring uses a 5-Axis Framework and replaces SWE-Verified with contamination-resistant LiveCodeBench.
Key finding: metacognition is the largest blind spot. FINAL Bench shows Error Recovery explains 94.8% of self-correction variance, yet only 9 of 42 models are even measured. The 9.2-point spread (Kimi K2.5: 68.71 β rank 9: 59.5) is 3Γ the GPQA top-model spread, suggesting metacognition may be the single biggest differentiator among frontier models today.
VLM cross-verification revealed rank reversals β Claude Opus 4.6 leads MMMU-Pro (85.1%) while Gemini 3 Flash leads MMMU (87.6%), producing contradictory rankings between the two benchmarks.
Nanochat Moroccan is the first language model family built specifically for Moroccan Darija.
This project brings together a small family of models and datasets centered on Darija, with the goal of building something genuinely useful for a language that is still underserved in AI.
Moroccan Darija is spoken by millions of people, yet it remains underrepresented in language technology. Nanochat Moroccan is a step toward building tools that take the language seriously.
Recently, I have open-sourced an AI emotional companion product based on openclaw, called opensoul.
On this platform, you can create a "soulmate" that matches your personality, and configure it with the skills, tools you want it to have, as well as the platforms it can integrate with (such as Telegram, Discord, etc.). You can even create group chats, invite multiple agents and your friends to chat about recent events, discuss projects together, and so on.
On the one hand, I hope it can better accompany you in daily life by virtue of its unique memory mechanism, self-feedback and iteration mechanism, and the modeling of users' emotions. On the other hand, I also hope it can help you better handle your work with its unique skills, tools and ability to deal with complex task scenarios.
Although the entire product has taken shape, I think there are still many areas that need adjustment and optimization. I also hope to rely on the strength of the community to do a good job in AI emotional companionship.