Solution for developing online conferencing Video / Audio call application

 2020-07-08

Solution for developing online conferencing Video / Audio call application

Before learning about technology, remember that you have ever used audio/video call tools such as Google Hangouts? Have you wondered how they make such a smooth video call website with so many people involved at the same time?

Those applications are built around technology, called WebRTC.

What is WebRTC?

WebRTC (Web Real-Time Communication) is a technology that allows web applications to use P2P protocols to exchange data between two devices, allowing web applications to make audio, video calls, share file, screen, etc. without the need for any third-party support, no need to install any additional plugins.

WebRTC has been developing, it is supported in most browsers, but that support is not really consistent. One of the solutions to this problem is to use the Adapter, which you read in the references at the end of the article.

Because this is a technology supported by the browser, helping two browsers at two points can interact with each other without transferring data to the intermediate server, which helps the signal go faster, lower latency, the application works more efficiently. It is very popular in developing online conferencing products today.

Before we learn about how WebRTC APIs are supported in browsers, we need to learn about how the two machines can talk to each other directly, which is the core of the technology.

The concept of protocols

Private/Public IP

The next concepts are closely related to two concepts: private IP and public IP.

The theory behind private/public IP is quite long, you will see right away, why this concept, what problem does it solve, etc.

However, you can summarize this concept as follows: private IP is a local area network, for example, a company's network, two local area networks may have the same configuration and will not "see" each other. And public IP is used to identify a single host on the Internet and it is unique.

ICE (Interactive Connectivity Establishment )

Normally, two computers in two different private networks will not be able to talk to each other for many reasons, ICE is a technique to solve that problem, within this article it will use STUN / TURN server to do that.

NAT (Network Address Translation)

A method to convert the IP information of a packet from the local network (private) to the public network (internet). NAT helps computers on the local network can talk to other machines in the public environment. The conversion of packet information from the local network to the public network or vice versa is done at the router assigned between the two networks.

STUN (Session Traversal Utilities for NAT)

As an auxiliary protocol for other types of protocols when working with NAT. It allows determining the type of NAT, public IP address, a port that NAT has allocated to the machine in the local network to connect to. The limitation of STUN is that it does not work with Symmetric NAT. To work with STUN we must have an intermediate STUN server found in Peers.

TURN (Traversal Using Relays around NAT)

Similar to STUN, TURN helps overcome SATAN's disadvantages with Symmetric NAT, instead of connecting directly between 2 Peers, all connection and data exchange with TURN is through a TURN server. The downside of this option is that the TURN server, for example, when you make a video call with a few dozen people but have to transmit over a server, is really not very good.

SDP (Session Description Protocol)

SDP is a standard that describes content exchanged when connected, such as resolution, encoding, format, etc. It helps Peers have consistency when exchanging data to understand each other.

Main concepts / APIs of WebRTC

Signalling Server

Since the beginning of the article, WebRTC is a technology that uses P2P protocols to communicate, so how can the two machines connect and communicate with each other? Before peers can communicate with each other, we have to have an intermediary layer to install the connection between the two machines or in other words we need a server in the middle, that server is often called the Signaling Server. 

RTCPeerConnection

As the interface in the WebRTC API provided by the Browser, it demonstrates a WebRTC connection from a local machine to a remote machine, this interface provides methods to allow the web application to make, maintain and close the connection when the conversation ends.

MediaStream

The interface in the WebRTC API provided by the Browser is the interface that shows the stream of data transmission between the two machines after the connection has been established. Audio and video data of the conversation will be transmitted in each stream, the input of this local machine is the output of the other remote.


RTCDataChannel

As the interface in the WebRTC API provided by the Browser, after the connection is established, data channels will be assigned to that connection allowing Peers to perform data transmission with each other through the P2P protocol, in theory, each connection can have up to 65,534 data channels attached. However, the actual number of it depends on the browser.

The basic architecture of a WebRTC - P2P application

The architecture of a WebRTC (P2P) application is quite simple, the way it works is described through the following steps:

1: User A sends a message (offer SDP) on Signaling server to say that he wants to talk to B.

2: Signaling server somehow informs B that A wants to talk.

3: B accepts (answer SDP).

4: Signaling server notifies A that B has accepted.

5: A connection is established between A and B, from here all exchanges of video, audio, files, etc. between A and B are through an established connection.

For more detail, see the following models for sequential steps, how are the APIs called to establish a connection between the two machines?

Sample Application

There are many complete open sources available for a WebRTC application now. However, before you start with that large source, you should try to experiment with a small pure source based on WebRTC API to better understand this technology.

This sample source you can get from https://github.com/lapth/FirebaseRTC.git, Forked source from https://github.com/webrtc/FirebaseRTC provided by http://webrtc.org.

In this article, the author chooses a sample based on Firebase and Firebase Hosting to minimize the process of talking about Signaling Server / BE Part. The BE part of every WebRTC application depends on the fact that there are no general rules.

This application is quite small and easy to understand, the application consists of 2 parts:

  1. For code related to WebRTC application management, view in the public / app.js file
  1. Notice the config part of the STUN / TURN. These servers are only used as an example, in reality, you need to have a private server that you can trust.
  2. If you are not familiar with the Firebase commands you can skim through, you can understand it as a RESTful CRUD statement.
  3. Notice the config part of the STUN / TURN. These servers are only used as an example, in reality, you need to have a private server that you can trust.

Which solution for Multiparty Video Conferencing

The previous sections of the article presented the basic concepts and architecture of a WebRTC application in the P2P model. Next, we continue to explore different architectures for group discussion.

Mesh

In Mesh architecture, browsers will connect directly with each other to transfer data. Each browser will transmit data to the remaining browsers joining the group, and receive data from the browsers that arrive.

Pros: simple, low cost, effective for small groups, usually from 4 to 6 parties involved.

Cons: can not be applied to large groups, depending on the device at each Peer, traffic upload/download data in each large Peer.

MCU (Multipoint Conferencing Unit)

Unlike Mesh, Peer MCU solution will send data to a central server, the central server will then be responsible for decoding, mixing the streams received from the participants, then re-encoding into one. stream only then pass to the parties.

Pros: solve the disadvantages of Mesh, do not need too many resources in each Peer.

Cons: requires a central server to be strong enough to decode and recode the data of participants with the lowest latency.

SFU (Selective Forwarding Unit)

And finally, the SFU architecture still needs a central server, but unlike the MCU architecture, the mission of this server is to forward data, when receiving data from a stream. That, the central server will forward that stream to other participants.

This is a good solution for large group discussion.

Regarding the effectiveness of these three solutions, you can refer to the report at https://testrtc.com/different-multiparty-video-conferencing/

If you are looking for a reliable offshore partner to develop your project in Vietnam, Hachinet is here to help your business.

Hachinet is also a place to experience with the following characteristics:

  • Microsoft .NET Website Development (asp, vb.net, etc)
  • Front end Website development
  • Java System / Application Development
  • Mobile Application Development (IOS / Android).
  • Cobol system development.

We also provide:

  • Flexible offshore development
  • Dispatching BrSE to Japan

If you are interested in our service, do not hesitate to drop a line at contact@hachinet.com