ginipick's picture

ginipick

ginipick

AI & ML interests

None yet

Recent Activity

liked a Space about 10 hours ago
VIDraft/vkae
reacted to ginigen-ai's post with πŸ”₯ about 15 hours ago
🧠 Does your LLM know when it's about to be wrong? Most leaderboards measure accuracy. We measure metacognition β€” whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. πŸŽ‰ The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 β€” ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal. Two independent axes (never compared across a row): β‘  trap_rate β€” does it fall for tempting trap options? (lower = stronger) β‘‘ adapter gain Ξ” β€” how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value) What's open: πŸ“Š 300+100 trap problems (each with a hidden trap + TICOS type) πŸ† 24-model leaderboard 🧩 11 per-model adapters β€” adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state β†’ P(wrong)) Submit any HF model β†’ auto-scored daily at 09:00 KST and added to the board. πŸ† Leaderboard β†’ https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space πŸ“Š Benchmark β†’ https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench 🧩 Adapters β†’ https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961 πŸ“Š Article β†’ https://huggingface.co/blog/ginigen-ai/metacognition Benchmark by ginigen-ai Β· Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).
reacted to ginigen-ai's post with ❀️ about 15 hours ago
🧠 Does your LLM know when it's about to be wrong? Most leaderboards measure accuracy. We measure metacognition β€” whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. πŸŽ‰ The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 β€” ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal. Two independent axes (never compared across a row): β‘  trap_rate β€” does it fall for tempting trap options? (lower = stronger) β‘‘ adapter gain Ξ” β€” how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value) What's open: πŸ“Š 300+100 trap problems (each with a hidden trap + TICOS type) πŸ† 24-model leaderboard 🧩 11 per-model adapters β€” adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state β†’ P(wrong)) Submit any HF model β†’ auto-scored daily at 09:00 KST and added to the board. πŸ† Leaderboard β†’ https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space πŸ“Š Benchmark β†’ https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench 🧩 Adapters β†’ https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961 πŸ“Š Article β†’ https://huggingface.co/blog/ginigen-ai/metacognition Benchmark by ginigen-ai Β· Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).
View all activity

Organizations

Tune a video concepts library's profile picture ginigen's profile picture VIDraft's profile picture korea forestry's profile picture PowergenAI's profile picture OpenFree_AI's profile picture GiniGen-LAB's profile picture