HAE-RAE

non-profit

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

amphora updated a dataset about 20 hours ago

HAERAE-HUB/KMMMU

2nhyn updated a dataset about 21 hours ago

HAERAE-HUB/KMMMU

Cartinoe5930 authored a paper 17 days ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

View all activity

Papers

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

View all Papers

updated a dataset about 20 hours ago

HAERAE-HUB/KMMMU

Viewer • Updated about 20 hours ago • 3.45k • 151 • 7

2nhyn

updated a dataset about 21 hours ago

HAERAE-HUB/KMMMU

Viewer • Updated about 20 hours ago • 3.45k • 151 • 7

authored a paper 17 days ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published 21 days ago • 23

submitted a paper to Daily Papers 18 days ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published 21 days ago • 23

updated a dataset about 1 month ago

HAERAE-HUB/Ko-PIQA

Viewer • Updated Jan 13 • 441 • 27 • 3

authored a paper about 1 month ago

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Paper • 2601.06165 • Published Jan 7 • 16

updated a Space about 1 month ago

README

submitted a paper to Daily Papers about 1 month ago

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Paper • 2601.06165 • Published Jan 7 • 16

updated a dataset about 1 month ago

HAERAE-HUB/HAERAE-VISION

Viewer • Updated Jan 13 • 165 • 67 • 12

published a dataset about 2 months ago

HAERAE-HUB/HAERAE-VISION

Viewer • Updated Jan 13 • 165 • 67 • 12

authored 3 papers about 2 months ago

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

Paper • 2510.24081 • Published Oct 28, 2025 • 19

Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting

Paper • 2409.14747 • Published Sep 23, 2024

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Paper • 2601.01836 • Published Jan 5 • 10

authored 5 papers about 2 months ago

Measuring Sycophancy of Language Models in Multi-turn Dialogues

Paper • 2505.23840 • Published May 28, 2025 • 2

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Paper • 2508.13141 • Published Aug 18, 2025

VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding

Paper • 2509.21451 • Published Sep 25, 2025

SPICE: Self-Play In Corpus Environments Improves Reasoning

Paper • 2510.24684 • Published Oct 28, 2025 • 18

authored a paper 3 months ago

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

Paper • 2511.22173 • Published Nov 27, 2025 • 15

authored a paper 4 months ago

AI PB: A Grounded Generative Agent for Personalized Investment Insights

Paper • 2510.20099 • Published Oct 23, 2025