微软亚洲研究院姜世琦研究员学术报告

来源：点击：时间：2022年10月26日 08:14

报告人：微软亚洲研究院姜世琦研究员

报告地点：#腾讯会议：209-856-458

报告时间：2022年10月27日（周四）上午10点

报告题目： Towards Efficient and Accurate Edge Video Analytics

个人简介:

Shiqi Jiang is now a Senior Researcher in MSRA Shanghai. He received the Ph.D. degree in computer science from Nanyang Technological University in 2018, supervised by Prof. Mo Li, and the Bachelor degree in computer engineering from Zhejiang University. Before joining MSRA in 2019, He had also worked as a Senior Engineer at Ant Lab. His research interests broadly fall in edge computing, mobile sensing, Internet-of-Things (IoT) and wearables.

报告摘要:

In this talk, we would like to introduce two of our recent works on edge video analytics, Remix and Turbo.

Remix: Object detection is a fundamental building block of video analytics applications. While Neural Networks (NNs)-based object detection models have shown excellent accuracy on benchmark datasets, they are not well positioned for high-resolution images inference on resource-constrained edge devices. Common approaches, including down-sampling inputs and scaling up neural networks, fall short of adapting to video content changes and various latency requirements. This paper presents Remix, a flexible framework for high-resolution object detection on edge devices. Remix takes as input a latency budget, and come up with an image partition and model execution plan which runs off-the-shelf neural networks on non-uniformly partitioned image blocks. As a result, it maximizes the overall detection accuracy by allocating various amount of compute power onto different areas of an image. We evaluate Remix on public dataset as well as real-world videos collected by ourselves. Experimental results show that Remix can either improve the detection accuracy by 18%-120% for a given latency budget, or achieve up to 8.1X inference speedup with accuracy on par with the state-of-the-art NNs.

Turbo: Edge computing is being widely used for video analytics. To alleviate the inherent tension between accuracy and cost, various video analytics pipelines have been proposed to optimize the usage of GPU on edge nodes. Nonetheless, we find that GPU compute resources provisioned for edge nodes are commonly under-utilized due to video content variations, subsampling and filtering at different places of a pipeline. As opposed to model and pipeline optimization, in this work, we study the problem of opportunistic data enhancement using the non-deterministic and fragmented idle GPU resources. In specific, we propose a task-specific discrimination and enhancement module and a model-aware adversarial training mechanism, providing a way to identify and transform low-quality images that are specific to a video pipeline in an accurate and efficient manner. A multi-exit model structure and a resource-aware scheduler is further developed to make online enhancement decisions and fine-grained inference execution under latency and GPU resource constraints. Experiments across multiple video analytics pipelines and datasets reveal that by judiciously allocating a small amount of idle resources on frames that tend to yield greater marginal benefits from enhancement, our system boosts DNN object detection accuracy by 7.3−11.3% without incurring any latency costs.