Voice over Internet Protocol

Table of Contents:

Definition and Overview


Internet telephony refers to communications services—voice, facsimile, and/or voice-messaging applications—that are transported via the Internet, rather than the public switched telephone network (PSTN). The basic steps involved in originating an Internet telephone call are conversion of the analog voice signal to digital format and compression/translation of the signal into Internet protocol (IP) packets for transmission over the Internet; the process is reversed at the receiving end.


This tutorial discusses the ongoing but rapid evolution of Internet telephony, the market forces fueling that evolution and the benefits that users can realize, as well as the underlying technologies. It also examines the hurdles that must be overcome before Internet telephony can be adopted on a widespread basis.


The possibility of voice communications traveling over the Internet, rather than the PSTN, first became a reality in February 1995 when Vocaltec, Inc. introduced its Internet Phone software. Designed to run on a 486/33-MHz (or higher) personal computer (PC) equipped with a sound card, speakers, microphone, and modem (see Figure 1), the software compresses the voice signal and translates it into IP packets for transmission over the Internet. This PC-to-PC Internet telephony works, however, only if both parties are using Internet Phone software.

Figure 1. PC Configuration for VoIP

Figure 1. PC Configuration for VoIP

In the relatively short period of time since then, Internet telephony has advanced rapidly. Many software developers now offer PC telephony software but, more importantly, gateway servers are emerging to act as an interface between the Internet and the PSTN (see Figure 2). Equipped with voice-processing cards, these gateway servers enable users to communicate via standard telephones.

Figure 2. Topology of PC-to-Phone

Figure 2. Topology of PC-to-Phone

Figure 3. Sequence of VoIP Connection: PC-to-Phone

Figure 3. Sequence of VoIP Connection: PC-to-Phone

A call goes over the local PSTN network to the nearest gateway server, which digitizes the analog voice signal, compresses it into IP packets, and moves it onto the Internet for transport to a gateway at the receiving end (see Figure 4). With its support for computer-to-telephone calls, telephone-to-computer calls and telephone-to-telephone calls, Internet telephony represents a significant step toward the integration of voice and data networks.

Figure 4. Sequence of VoIP Connection

Figure 4. Sequence of VoIP Connection

Originally regarded as a novelty, Internet telephony is attracting more and more users because it offers tremendous cost savings relative to the PSTN. Users can bypass long-distance carriers and their per-minute usage rates and run their voice traffic over the Internet for a flat monthly Internet-access fee.

Figure 5. PC-to-Phone Connection

Figure 5. PC-to-Phone Connection

Figure 6. Phone-to-Phone Connection

Figure 6. Phone-to-Phone Connection

Intranet Telephony Paves the Way for Internet Telephony

Although progressing rapidly, Internet telephony still has some problems with reliability and sound quality, due primarily to limitations both in Internet bandwidth and current compression technology. As a result, most corporations looking to reduce their phone bills today confine their Internet-telephony applications to their intranets. With more predictable bandwidth available than the public Internet, intranets can support full-duplex, real-time voice communications. Corporations generally limit their Internet voice traffic to half-duplex asynchronous applications (e.g., voice messaging).

Internet telephony within an intranet enables users to save on long-distance bills between sites; they can make point-to-point calls via gateway servers attached to the local-area network (LAN). No PC–based telephony software or Internet account is required.

For example, User A in New York wants to make a (point-to-point) phone call to User B in the company's Geneva office. He picks up the phone and dials an extension to connect with the gateway server, which is equipped with a telephony board and compression-conversion software; the server configures the private branch exchange (PBX) to digitize the upcoming call. User A then dials the number of the London office, and the gateway server transmits the (digitized, IP–packetized) call over the IP–based wide-area network (WAN) to the gateway at the Geneva end. The Geneva gateway converts the digital signal back to analog format and delivers it to the called party.

Figure 7. PC–to-Phone Connection

Figure 7. PC–to-Phone Connection

Figure 8. Internet Telephony Gateway

Figure 8. Internet Telephony Gateway

This version of Internet telephony also enables companies to transmit their (digitized) voice and data traffic together over the intranet in support of shared applications and whiteboarding.

Technical Barriers

The ultimate objective of Internet telephony is, of course, reliable, high-quality voice service, the kind that users expect from the PSTN. At the moment, however, that level of reliability and sound quality is not available on the Internet, primarily because of bandwidth limitations that lead to packet loss. In voice communications, packet loss shows up in the form of gaps or periods of silence in the conversation, leading to a clipped-speech effect that is unsatisfactory for most users and unacceptable in business communications.

Figure 9. Internet Telephony

Figure 9. Internet Telephony

The Internet, a collection of more than 130,000 networks, is gaining in popularity as millions of new users sign on every month. The increasingly heavy use of the Internet's limited bandwidth often results in congestion which, in turn, can cause delays in packet transmission. Such network delays mean packets are lost or discarded.

In addition, because the Internet is a packet-switched or connectionless network, the individual packets of each voice signal travel over separate network paths for reassembly in the proper sequence at their ultimate destination. While this makes for a more efficient use of network resources than the circuit-switched PSTN, which routes a call over a single path, it also increases the chances for packet loss.

Network reliability and sound quality also are functions of the voice-encoding techniques and associated voice-processing functions of the gateway servers. To date, most developers of Internet-telephony software, as well as vendors of gateway servers, have been using a variety of speech-compression protocols. The use of various speech-coding algorithms—with their different bit rates and mechanisms for reconstructing voice packets and handling delays—produces varying levels of intelligibility and fidelity in sound transmitted over the Internet. The lack of standardized protocols also means that many Internet-telephony products do not interoperate with each other or with the PSTN.


Over the next few years, the industry will address the bandwidth limitations by upgrading the Internet backbone to asynchronous transfer mode (ATM), the switching fabric designed to handle voice, data, and video traffic. Such network optimization will go a long way toward eliminating network congestion and the associated packet loss. The Internet industry also is tackling the problems of network reliability and sound quality on the Internet through the gradual adoption of standards. Standards-setting efforts are focusing on the three central elements of Internet telephony: the audio codec format; transport protocols; and directory services.

In May 1996, the International Telecommunications Union (ITU) ratified the H.323 specification, which defines how voice, data, and video traffic will be transported over IP–based local area networks; it also incorporates the T.120 data-conferencing standard (see Figure 10). The recommendation is based on the real-time protocol/real-time control protocol (RTP/RTCP) for managing audio and video signals.

Figure 10. H.323 Call Sequence

Figure 10. H.323 Call Sequence

As such, H.323 addresses the core Internet-telephony applications by defining how delay-sensitive traffic, (i.e., voice and video), gets priority transport to ensure real-time communications service over the Internet. (The H.324 specification defines the transport of voice, data, and video over regular telephony networks, while H.320 defines the protocols for transporting voice, data, and video over integrated services digital network (ISDN).

H.323 is a set of recommendations, one of which is G.729 for audio codecs, which the ITU ratified in November 1995. Despite the ITU recommendation, however, the Voice over IP (VoIP) Forum in March 1997 voted to recommend the G.723.1 specification over the G.729 standard. The industry consortium, which is led by Intel and Microsoft, agreed to sacrifice some sound quality for the sake of greater bandwidth efficiency—G.723.1 requires 6.3 kbps, while G.729 requires 7.9 kbps. Adoption of the audio codec standard, while an important step, is expected to improve reliability and sound quality mostly for intranet traffic and point-to-point IP connections. To achieve PSTN–like quality, standards are required to guarantee Internet connections.

The transport protocol RTP, on which the H.323 recommendation is based, essentially is a new protocol layer for real-time applications; RTP–compliant equipment will include control mechanisms for synchronizing different traffic streams. However, RTP does not have any mechanisms for ensuring the on-time delivery of traffic signals or for recovering lost packets. RTP also does not address the so-called quality of service (QoS) issue related to guaranteed bandwidth availability for specific applications. Currently, there is a draft signaling-protocol standard aimed at strengthening the Internet's ability to handle real-time traffic reliably (i.e., to dedicate end-to-end transport paths for specific sessions much like the circuit-switched PSTN does). If adopted, the resource reservation protocol (RSVP), will be implemented in routers to establish and maintain requested transmission paths and quality-of-service levels.

Finally, there is a need for industry standards in the area of Internet-telephony directory services. Directories are required to ensure interoperability between the Internet and the PSTN, and most current Internet-telephony applications involve proprietary implementations. However, the lightweight directory access protocol (LDAP v3.0) seems to be emerging as the basis for a new standard.

Future of Voice-over-Internet Protocol (VoIP) Telephony

Several factors will influence future developments in VoIP products and services. Currently, the most promising areas for VoIP are corporate intranets and commercial extranets. Their IP–based infrastructures enable operators to control who can—and cannot—use the network.

Another influential element in the ongoing Internet-telephony evolution is the VoIP gateway. As these gateways evolve from PC–based platforms to robust embedded systems, each will be able to handle hundreds of simultaneous calls. Consequently, corporations will deploy large numbers of them in an effort to reduce the expenses associated with high-volume voice, fax, and videoconferencing traffic. The economics of placing all traffic— data, voice, and video—over an IP–based network will pull companies in this direction, simply because IP will act as a unifying agent, regardless of the underlying architecture (i.e., leased lines, frame relay, or ATM) of an organization's network.

Commercial extranets, based on conservatively engineered IP networks, will deliver VoIP and facsimile over Internet protocol (FAXoIP) services to the general public. By guaranteeing specific parameters, such as packet delay, packet jitter, and service interoperability, these extranets will ensure reliable network support for such applications.

VoIP products and services transported via the public Internet will be niche markets that can tolerate the varying performance levels of that transport medium. Telecommunications carriers most likely will rely on the public Internet to provide telephone service between/among geographic locations that today are high-tariff areas. It is unlikely that the public Internet's performance characteristics will improve sufficiently within the next two years to stimulate significant growth in VoIP for that medium.

However, the public Internet will be able to handle voice and video services quite reliably within the next three to five years, once two critical changes take place:

1. An increase by several orders of magnitude in backbone bandwidth and access speeds, stemming from the deployment of IP/ATM/synchronous optical network (SONET) and ISDN, cable modems, and x digital subscriber line (xDSL) technologies, respectively

2. The tiering of the public Internet, in which users will be required to pay for the specific service levels they require

On the other hand, FAXoIP products and services via the public Internet will become economically viable more quickly than voice and video, primarily because the technical roadblocks are less challenging. Within two years, corporations will take their fax traffic off the PSTN and move it quickly to the public Internet and corporate Intranet, first through FAXoIP gateways and then via IP–capable fax machines. Standards for IP–based fax transmission will be in place by the end of this year.

Throughout the remainder of this decade, videoconferencing (H.323) with data collaboration (T.120) will become the normal method of corporate communications, as network performance and interoperability increase and business organizations appreciate the economics of telecommuting. Soon, the video camera will be a standard piece of computer hardware, for full-featured multimedia systems, as well as for the less-than-$500 network-computer appliances now starting to appear in the market. The latter in particular should stimulate the residential demand and bring VoIP services to the mass market—including the roughly 60 percent of American households that still do not have a PC.

VoIP systems are commomnly used for business telephone systems because they are cost effective and have many more features than conventional land lines.


ATM - asynchronous transfer mode

DLE - DTM LAN emulation

FAXoIP - facsimile over Internet protocol

IP - Internet protocol

ISDN - integrated services digital network

ITU - International Telecommunications Union

KBPS - kilobytes per second

LAN - local-area network

LDAP - lightweight directory access protocol

MHz - megahertz

PBX - private branch exchange

PC - personal computer

PSTN - public switched telephone network

QoS - quality of service

RTCP - real-time control protocol

RTP - real-time protocol

SOHO - small-office/home-office

SONET - synchronous optical network

VoIP - voice over Internet protocol

VPN - virtual private network

WAN - wide-area network

xDSL - x digital subscriber line (e.g., x = A for "asymmetric", x = H for "high bit-rate")

Source: http://www.iec.org