Ground-Truth or DAER: Selective Re-query of Secondary Information
Lemmer, S. J., Corso, J. J.
To be published in Proceedings of the 2021 International Conference on Computer Vision (ICCV)

“Many vision tasks use secondary information at inference time — a seed — to assist a computer vision model in solving a problem. For example, an initial bounding box is needed to initialize visual object tracking. To date, all such work makes the tacit assumption that the seed is a good one. However, in practice, from crowdsourcing to noisy automated seeds, this is often not the case. We hence propose the problem of seed rejection — determining whether to reject a seed based on the expected performance degradation when it is provided in place of a gold-standard seed. We provide a formal definition to this problem, and focus on two meaningful subgoals: understanding causes of error and understanding the model’s response to noisy seeds conditioned on the primary input. With these goals in mind, we propose a novel training method and evaluation metrics for the seed rejection problem. We then use seeded versions of viewpoint estimation and fine-grained classification tasks to evaluate these contributions. In these experiments, we show our method can reduce the number of seeds that need to be reviewed for a target performance by over 23% compared to strong baselines.”

Crowdsourcing More Effective Initializations for Single-Target Trackers Through Automatic Re-querying (paper, video presentation)
Lemmer, S. J., Song, J. Y., Corso, J. J.
In Proceedings of the 2021 Conference on Human Factors in Computing Systems (CHI)

“In single-target video object tracking, an initial bounding box is drawn around a target object and propagated through a video. When this bounding box is provided by a careful human expert, it is expected to yield strong overall tracking performance that can be mimicked at scale by novice crowd workers with the help of advanced quality control methods. However, we show through an investigation of 900 crowdsourced initializations that such quality control strategies are inadequate for this task in two major ways: first, the high level of redundancy in these methods (e.g., averaging multiple responses to reduce error) is unnecessary, as 23% of crowdsourced initializations perform just as well as the gold-standard initialization. Second, even nearly perfect initializations can lead to degraded long-term performance due to the complexity of object tracking. Considering these findings, we evaluate novel approaches for automatically selecting bounding boxes to re-query, and introduce Smart Replacement, an efficient method that decides whether to use the crowdsourced replacement initialization.”

Popup: reconstructing 3D video using particle filtering to aggregate crowd responses.
Song, J. Y., Lemmer, S. J., Liu, M.X., Yan, S., Kim, J., Corso, J. J., & Lasecki, W.S.
In Proceedings of the 24th International Conference on Intelligent User Interfaces

“Collecting a sufficient amount of 3D training data for autonomous
vehicles to handle rare, but critical, traffic events (e.g., collisions)
may take decades of deployment. Abundant video data of such
events from municipal traffic cameras and video sharing sites (e.g.,
YouTube) could provide a potential alternative, but generating re-
alistic training data in the form of 3D video reconstructions is a
challenging task beyond the current capabilities of computer vi-
sion. Crowdsourcing the annotation of necessary information could
bridge this gap, but the level of accuracy required to obtain usable
reconstructions makes this task nearly impossible for non-experts.
In this paper, we propose a novel hybrid intelligence method that
combines annotations from workers viewing different instances
(video frames) of the same target (3D object), and uses particle
filtering to aggregate responses. Our approach can leveraging tem-
poral dependencies between video frames, enabling higher quality
through more aggressive filtering. The proposed method results in
a 33% reduction in the relative error of position estimation com-
pared to a state-of-the-art baseline. Moreover, our method enables
skipping (self-filtering) challenging annotations, reducing the total
annotation time for hard-to-annotate frames by 16%. Our approach
provides a generalizable means of aggregating more accurate crowd
responses in settings where annotation is especially challenging or

Subtle Anomaly Detection in the Global Dynamics of Connected Vehicle Systems
Sturgeon, P.K., Mott, C.M., Lemmer, S.J., & Brown, M.A.
In Proceedings of ITS World Congress, October 10-14, 2016.

Loosening the Reins: Decentralized Allocation of Resources and Tasks for Hetrogeneous Multi-Agent Systems
Lemmer, S.J.,
In Proceedings of XPONENTIAL 2019