Skip to Content
Home /  Courses And Programs / Futures: Audio, Video, Image, and Augmented Reality Processing

Wireless networks have empowered smart cities and homes as well as work, gaming, and entertainment in the cloud. The powerful combination of media streaming and wireless networks further expands the reach of on-the-go entertainment. Media content such as audio and video are the most engaging source of information and entertainment. Many teenagers spend more than three times of their spare time watching TV and listening to music than on social media. The global footprint of live and on-demand media services has been expanded by the internet, which has opened up new ways for discovering, sharing, and consuming media content anywhere, anytime, and on any device. Conversely, it takes nothing more than a smartphone and a YouTube channel to become a global broadcaster and producer of live events. While wireless and internet technologies have brought humans closer together by allowing instant connectivity, recent advances in artificial intelligence (AI) will broaden that convergence by unifying computers and content to provide smart and predictive analysis, and even award-winning content far beyond the capabilities of the human brain.

This course will provide a detailed description of audio, video, image, and augmented reality processing. In addition to key audio coding standards, it will discuss the coding of video content in emerging fields, including high dynamic range (HDR), computer-generated screen content, and immersive applications such as omnidirectional (360-degree) video and augmented reality (AR). The course will also examine the role of AI in image and natural language processing. Interesting demos will be provided by the instructor.

Learning Outcomes:

  • Learn the architecture of popular and emerging audio codecs (AAC, SBC, LC3, EVS).

  • Understand the core modules of video codecs (AVC, HEVC, AV1, VVC) and development platforms (FFMPEG).

  • Understand the tradeoffs in coding efficiency and scene complexity for typical scenarios.

  • Describe adaptive streaming platforms (Apple live streaming, MPEG-DASH).

  • Review the essentials of immersive communications using AR.

  • Learn the basics of JPEG AI, natural language processing, speech recognition, and ChatGPT.

Course Number: EE-90009
Credit: 3.00 unit(s)

+ Expand All