benchmarks

Benchmarking egocentric multimodal goal inference for assistive wearable agents

We present a benchmark for egocentric multimodal goal inference for assistive wearable agents. This benchmark evaluates the ability of AI systems to infer user goals from egocentric video, audio, and other sensor modalities in real-world scenarios. …

Accelerating scientific discovery with the common task framework

We propose the common task framework as a mechanism for accelerating scientific discovery through collaborative machine learning. This framework establishes shared datasets, evaluation metrics, and benchmarks that enable the AI and scientific …