Crowdsourcing More Effective Initializations for Single-Target Trackers Through Automatic Re-querying
Lemmer, S. J., Song, J. Y., Corso, J. J.
To be published in Proceedings of the 2021 Conference on Human Factors in Computing Systems (CHI)

“In single-target video object tracking, an initial bounding box is drawn around a target object and propagated through a video. When this bounding box is provided by a careful human expert, it is expected to yield strong overall tracking performance that can be mimicked at scale by novice crowd workers with the help of advanced quality control methods. However, we show through an investigation of 900 crowdsourced initializations that such quality control strategies are inadequate for this task in two major ways: first, the high level of redundancy in these methods (e.g., averaging multiple responses to reduce error) is unnecessary, as 23% of crowdsourced initializations perform just as well as the gold-standard initialization. Second, even nearly perfect initializations can lead to degraded long-term performance due to the complexity of object tracking. Considering these findings, we evaluate novel approaches for automatically selecting bounding boxes to re-query, and introduce Smart Replacement, an efficient method that decides whether to use the crowdsourced replacement initialization.”

Popup: reconstructing 3D video using particle filtering to aggregate crowd responses.
Song, J. Y., Lemmer, S. J., Liu, M.X., Yan, S., Kim, J., Corso, J. J., & Lasecki, W.S.
In Proceedings of the 24th International Conference on Intelligent User Interfaces

“Collecting a sufficient amount of 3D training data for autonomous
vehicles to handle rare, but critical, traffic events (e.g., collisions)
may take decades of deployment. Abundant video data of such
events from municipal traffic cameras and video sharing sites (e.g.,
YouTube) could provide a potential alternative, but generating re-
alistic training data in the form of 3D video reconstructions is a
challenging task beyond the current capabilities of computer vi-
sion. Crowdsourcing the annotation of necessary information could
bridge this gap, but the level of accuracy required to obtain usable
reconstructions makes this task nearly impossible for non-experts.
In this paper, we propose a novel hybrid intelligence method that
combines annotations from workers viewing different instances
(video frames) of the same target (3D object), and uses particle
filtering to aggregate responses. Our approach can leveraging tem-
poral dependencies between video frames, enabling higher quality
through more aggressive filtering. The proposed method results in
a 33% reduction in the relative error of position estimation com-
pared to a state-of-the-art baseline. Moreover, our method enables
skipping (self-filtering) challenging annotations, reducing the total
annotation time for hard-to-annotate frames by 16%. Our approach
provides a generalizable means of aggregating more accurate crowd
responses in settings where annotation is especially challenging or

Subtle Anomaly Detection in the Global Dynamics of Connected Vehicle Systems
Sturgeon, P.K., Mott, C.M., Lemmer, S.J., & Brown, M.A.
In Proceedings of ITS World Congress, October 10-14, 2016.

Loosening the Reins: Decentralized Allocation of Resources and Tasks for Hetrogeneous Multi-Agent Systems
Lemmer, S.J.,
In Proceedings of XPONENTIAL 2019