DRM for WebRTC: End-to-end security for video streaming

Web real-time communications (WebRTC) has emerged as the go-to peer-to-peer video communication technology in recent times.

Using common JavaScript APIs for low-latency streaming of audio and video, it has become a highly sought-after free solution for business applications such as live streaming and desktop sharing.

While WebRTC’s open-source nature makes it highly versatile, the technology’s growing adoption has also led to increased security concerns.

There is an urgent call for additional security layers beyond standard transport layer security (TLS) encryption to protect paid premium content from piracy and data leaks.

In this article, we’ll explore the various use cases for WebRTC in the video streaming industry, delve into its technology, and discuss how digital rights management (DRM) software can provide end-to-end security to safeguard WebRTC streams.

DRM for WebRTC

What’s WebRTC

WebRTC is an open-source set of standards, communication protocols, and application programming interfaces (APIs) that allows peer-to-peer (P2P) real-time connections between two or more browsers without requiring the users to install plug-ins or any other third-party software.

With WebRTC, users can seamlessly share real-time voice, video, and arbitrary data between web browsers or other compatible applications without requiring server-side file hosting or native apps, making it a simple and cost-effective alternative to traditional video conferencing technologies (VTC).

This means that users can integrate real-time video communication capabilities into their web applications with ease. Its simplicity and flexibility make it an ideal building block for any use case that requires real-time communication between users.

Support for WebRTC is available across mobile and desktop for most web browsers, including Google Chrome, Apple Safari, Microsoft Edge, and Mozilla Firefox; positioning it as a widely used technology for real-time communication.

WebRTC relevance

WebRTC is poised to become more popular. The tool is already used by tech powerhouses like Google, Meta, and Discord to propel their communications platform as a service (CPaaS) through ultra-low latency (less than 1 second) streams.

WebRTC enables low latency media delivery and networking, it’s device agnostic, and allows for interoperability with voice over internet protocol (VoIP) and video. It also simplifies the complexities of presenting content in different formats across devices due to its simplicity to run in any web browser.

Furthermore, the use of WebRTC in the video industry is redefining the technical roadmap for real-time streaming use cases. Hence, companies are eager to leverage the ultra-low latency potential of WebRTC to improve end-users quality of experience (QoE) and boost their revenue streams.

WebRTC use cases

But, what exactly does WebRTC do? Is it a video call solution? A cloud gaming tool? A remote screen-sharing app? Well, it’s a little bit of everything. Since the technology allows two computers to share information through a web-based protocol, the potential applications are virtually endless.

One of the key advantages of WebRTC is that it allows users to communicate directly with each other without the need for any add-ons or software. This makes it an incredibly versatile and scalable tool for integrating live communication into a wide range of use cases.

WebRTC is particularly well-suited for video conferencing, live sports events and concerts, e-sports, and live betting. Increasingly, it’s also harnessed as a file-sharing and remote desktop solution.

WebRTC for live streaming

Although it was initially conceived for video conferencing, WebRTC is maturing into the preferred alternative for real-time streaming of premium content.

The near-to-real-time streaming of audio and video makes WebRTC an attractive alternative to improve the quality of experience (QoE) of end users when compared with Real-Time Messaging Protocol (RTMP), HTTP Live Streaming (HLS) or Dynamic Adaptive Streaming over HTTP (DASH) and MPEG-DASH.

Take for example live sports events. Fans don’t want to miss a beat when massive sporting events are taking place. Every second counts; delays between the time the action happens on the field and the time it’s displayed on the viewer’s screen can take a toll on end-users’ engagement and satisfaction.

With traditional streaming technologies like low-latency HLS or RTMP tuned, common latency times are between 2-5 seconds. With WebRTC, fans can experience the excitement of the game in almost real-time with less than 1-second latency, which greatly enhances their viewing experience.

In addition to low latency, WebRTC also supports high-quality video codecs, such as VP8 and VP9, which provide a more immersive and engaging experience for viewers. With every detail of the game displayed in crystal-clear clarity, fans can feel like they are part of the action, experiencing the excitement of the game from their own screen.

How does WebRTC work?

WebRTC employs a combination of different technologies such as HTML5, JavaScript, and video codecs to ensure communication between browsers.

To establish communication between peers, WebRTC relies on a signaling server to manage session description protocols (SDPs) and securely exchange information about the connection status between browsers. This is known as the signaling path.

A commonly used model for WebRTC deployment is depicted in the diagram below:

WebRTC diagram depicting the signaling path and the media

Considering that user devices usually sit behind firewalls and IP addresses are constantly changing for privacy reasons (what is known as network address translation, or NAT), WebRTC applies interactive connectivity establishment (ICE) to coordinate the discovery of the public IP addresses between peers and allow communication between them.

Then, WebRTC handles requests through a session traversal utilities for NAT (STUN) server to match the list of ICE candidates generated by both users containing the IP addresses and the port available to connect user one to user two.

It’s important to notice that the signaling path only handles the interaction between peers, and the media data transmitted in the media path is not touched by the signaling server.

There is an urgent call for additional security layers beyond standard transport layer security (TLS) encryption to protect paid premium content from piracy and data leaks.

In terms of security, WebRTC traditionally uses secure real-time transport protocol (SRTP) or standard transport-level protocol TLS/DTLS encryption to protect the streams of video and audio while traveling between browsers.

WebRTC architecture

Although originally developed for P2P connections, WebRTC is also adaptable to various application architectures based on specific use cases and security requirements. Advanced applications often require support for media handling and distribution, such as multicasting a stream to multiple users.

As a result, the WebRTC infrastructure can vary depending on the type of distribution. Standard P2P communication involves direct media content exchange between two browsers and typically employs TLS/DTLS encryption to ensure end-to-end security between the participants, as shown below:

WebRTC diagram explaining peer-to-peer connections

Things get slightly more complicated when you want to share your stream with more than one user. Broadcasting a single stream from a browser (e.g., video game streaming) or media source (e.g., live sporting events) to multiple participants requires streaming the same media several times.

To achieve this, the WebRTC workflow is deployed through more servers, usually selective forwarding units (SFU) for multicasting. This is the most common approach, where the SFUs, also known as media servers, act as intermediaries to forward data between the media source and multiple peers. The architecture is illustrated below:

WebRTC multicast diagram using media servers explained

This scheme is particularly valuable when handling high-quality video content that’s live-streamed. However, there is a catch to it. This pipeline doesn’t support end-to-end media encryption, which results in unencrypted data available to the media server, and therefore a potential exposure point for piracy.

WebRTC security risks

While it offers numerous advantages, WebRTC also raises security concerns regarding privacy, network security, and piracy risk. In 2022, several WebRTC vulnerabilities were exploited as heap buffer-overflow issues in Chrome.

As mentioned in the previous section, critical exposure points appear when encrypted media goes through SFUs. This raises the concern about WebRTC not providing end-to-end encryption (E2EE) as it only features TLS/DTLS between socket endpoints, which jeopardizes the security of the overall stream.

Usually, TLS/DTLS encrypts data between the host and the media server; and between the media server and the consumers. These are two separate, independent connections with their own encryption keys. The media server receives TLS-encrypted traffic from the host, decrypts it, and encrypts it again with TLS before forwarding it to consumers. This is shown in the following diagram.

WebRTC streams face security risks when deployed through media servers

Therefore, the main security risk of this approach is the lack of end-to-end protection due to unencrypted media in third-party SFUs, where the sender doesn’t have full control of the hardware and software. This also opens risks for interception, MITM attacks, and content piracy.

Although not directly linked to piracy, WebRTC vulnerabilities can be leveraged to facilitate copyright infringement. For instance, in file-sharing applications for remote desktops, perpetrators may exploit such vulnerabilities to distribute copyrighted material without authorization. Live event streaming is another example, where unauthorized users can hijack or share WebRTC streams, causing copyright infringement and revenue losses.

DRM for WebRTC

DRM is the industry standard for video protection. DRM solutions like DRMtoday not only shield video assets from piracy but also provide key control with features like geoblocking: restricting access based on a geographical location; concurrent stream limiting: controlling the number of streams allowed per user; and stream takedown: stopping streams for users flagged as pirating content.

While WebRTC media transmissions are secured by TLS/DTLS encryption, WebRTC didn’t support out-of-the-box DRM. If so, it required complicated technical integrations that bypassed the browser’s content decryption module (CDM). Until now.

DRMtoday for WebRTC provides end-to-end media encryption of WebRTC video streams and remote desktop sharing, bringing industry-standard DRM systems such as Google Widevine, Apple Fairplay, and Microsoft PlayReady to WebRTC streams.

DRMtoday for WebRTC media streams diagram

This new layer of protection sitting on top of TLS/DTLS directly addresses the vulnerabilities of traditional WebRTC encryption methods by introducing three fundamental benefits:

Description here

End-to-end media encryption: Traditional WebRTC encryption only offers TLS/DTLS between socket endpoints, but that doesn’t protect content through media servers. With DRM for WebRTC, end-to-end encryption is guaranteed.

Description here

Multi-DRM support: Adds multi-DRM support for Widevine, PlayReady, and FairPlay Streaming including hardware-secure levels.

Description here

Copy protection: Media streams are blacked-out during client-side screen recording to prevent content copying, which was previously unavailable for WebRTC streams.

How does it work?

The process begins with the encryptor creating a key from the server side. This key is then sent to DRMtoday for ingestion. The encryptor uses this key to encrypt the video frames, which are subsequently transmitted through the WebRTC pipeline. Upon receipt, the WebRTC pipeline is accessed to extract the encrypted video frames. These frames are then passed to a JavaScript transformer library for further processing. The workflow can be visualized below.

DRMtoday for WebRTC media streams explained

The stream is delivered to the client computer using WebRTC and securely decrypted and rendered inside the browser’s CDM through standard MSE/EME. Media source extensions (MSE) and encrypted media extensions (EME) are JavaScript APIs that provide browsers a method to interact with the CDM and handle key management and decryption.

The browser’s DRM component (Widevine, PlayReady, or FairPlay) along with license key delivery from DRMtoday are used to decrypt content for playback. Additionally, PRESTOplay player technology enables secure WebRTC browser playback as well as DRM processing. This workflow is compatible with all major browsers like Chrome, Edge, Chromium, and Safari.

DRMtoday also enables high-bandwidth digital content protection (HDCP) control, which summed with video preparation techniques like forensic watermarking, makes the whole solution ideal to protect both video-on-demand (VOD) and live streams.

DRM-protected desktop sharing

DRM for WebRTC also opens up remote desktop applications with end-to-end protection. This enables secure real-time remote desktop sharing for editing and reviewing premium content through browsers without installing any other software or plug-in.

The innovative approach easily enables remote desktop workflows for today’s studio production environments. DRM protection for WebRTC streams pars industry security standards for consumer video streaming to meet studio requirements, only using browsers to edit video and audio content. The workflow is shown below:

DRMtoday for desktop sharing applications

The features of castLabs DRM-protected desktop sharing include:

  • Multi-monitor for efficient workflows – work across multiple screens views at once with on-the-fly switching
  • Ultra-low latency for responsive interaction
  • Adaptive bitrate streaming (ABR) to accommodate changes in bandwidth
  • Multi-channel audio: Stereo, 5.1, 7.1
  • Clipboard sharing

How to enable DRM for WebRTC

Now let’s get technical. For desktop sharing, the structure is explained in the following scheme:

WebRTC for remote desktop applications explained

As illustrated, there are three different components:

  • HTTPS signaling server, which tracks the remote machines (hosts) and facilitates SDP/ICE candidates exchange between clients and hosts when a WebRTC connection is being established.
  • Remote desktop agent (the “host app”), a native C++ WebRTC command-line app responsible for screen/audio capturing and injecting mouse/keyboard events coming from the connected client.
  • Browser-based client, a JS app that allows you to connect to and work on host machines.

On the sending side, the native C++ API is used for both media streams and desktop sharing. This API allows the system to obtain every encoded frame and produce an encrypted version of it. This encryption is accomplished by plugging an instance of webrtc::FrameEncryptorInterface into the audio and video pipelines. The encryption process adheres to the ISO/IEC 23001-7 common encryption scheme (CENC) specification, which uses a standard DRM-grade AES-128 encryption method. The encryption key and initialization vector (IV) are ingested into castLabs’ DRMtoday for secure distribution.

For video, the system employs partial sample encryption. This method ensures that various headers, such as sequence and picture parameter sets (SPS, PPS), and video slice headers (currently only H.264/AVC is supported) remain unaltered, while the video coding layer (VCL) data is encrypted. This approach ensures that the output of the encryptor module is virtually indistinguishable from unencrypted video and passes through WebRTC’s internal video parser as if it were a regular H.264/AVC Annex B video stream.

The encryption technique used is transport-agnostic, meaning that it works equally well with GStreamer (the system has a sample GStreamer-based encrypted RTMP streamer and receiver that uses the same encryption library) or any other framework as long as no transcoding is involved.

DRMtoday for WebRTC media streams explained

On the receiver side, a JavaScript library called the transformer module is utilized. This library must be connected to an existing WebRTC connection and an video element. Once connected, it takes responsibility for decrypting and rendering the video frames.

Encrypted video frames are extracted from WebRTC through the Encoded Transform, also known as the Insertable Streams API. They’re then packaged on-the-fly in a format that’s suitable for the browser’s CDM and fed into it using the MSE/EME API.

DRMtoday for WebRTC media streams explained in detail

The browser’s CDM requests the decryption key from DRMtoday and utilizes it to decrypt the media within its secure environment. The decryption key and decrypted video frames are never exposed to the client, and the CDM guarantees that protected video elements are blanked out if the screen is captured.

Conclusion

As the use of WebRTC evolves into more complex workflows involving multiple third-party services, so do security concerns. That’s why DRM-protected media streams stand out from other security methods as they provide end-to-end encryption with secure key exchange. With this structure, security risks are mitigated in both unicast and multicast applications as the encryption is truly node-to-node regardless of server infrastructure.

This is the first time WebRTC streams are protected by DRM technology utilizing the browser’s CDM, which increases the potential for new use cases. For studios and content owners, DRM-protected remote desktop sharing for pre-released videos provides the ultimate protection against piracy and content leaks, which wouldn’t have been possible before employing other protection systems or traditional TLS/DTLS encryption.

Contact us

If you’re interested in implementing DRM for WebRTC and taking your streaming to the next level, get in touch with us to experience a demo first-hand.

Posted:

Share:
Posted by

Jean Navarrete

Jean Navarrete
Digital Marketing Manager

View more blog posts