Designing IP Phones for Voice Quality

In the second of this week’s posts on designing IP phones SNOM Technology AG UK Marketing Manager Lesley Hansen explores the subject of designing IP phones for Voice Quality.

Voice quality is not a single thing, and it can be highly subjective. Although you can measure the voice quality of a codec used in the IP Phone each vendor’s implementation of these codecs may be different, resulting in higher or lower voice quality. But voice quality is one of the primary requirement in IP Phone design for a professional and enterprise handset and skimping on voice quality testing is one of the easiest ways for a vendor to avoid development costs and produce a sub quality handset.

What is Toll Quality?

Toll Quality is the panacea. The aim of every VoIP Vendor, and the claim of many vendors is to provide Toll Quality Voice. That is voice quality equal to that of the analogue long-distance public switched network. But it is not measurable. A common benchmark telephony vendors and carriers use and the ITU has adopted to determine the quality of is the mean opinion score (MOS). MOS is a test that has been used for decades in telephony networks to obtain the human user’s view of the quality of the network. A MOS score of 4 is perceptible but not annoying and 5 is rated as excellent. But MOS provides a subjective measurement based on a single set of circumstances. For instance the MOS score given in a quite office and that given in an office with extensive background noise would be different.

Measuring Voice over IP (VoIP) is more objective, and uses a calculation based on performance of the IP network over which it is carried. The calculation is defined in the ITU-T PESQ P.862 standard. Like most standards, the implementation is somewhat open to interpretation by the manufacturers. Even more significant, depending on the implementation by the IP Phone manufacturers, a calculated MOS of 3.9 in a VoIP network may actually sound better than the formerly subjective score of > 4.0 that was considered to be the equivalent to Toll Quality.

Building the Handset

The design of the handset will also affect audio quality, this includes aspects such as the thickness of the plastic selected and the shape of the phone. For best quality IP Phone design an audio engineer is involved with the industrial designer from the first stage of each new phone design. The audio engineer can explain the audio rules to the designer.

For instance every speaker needs a chamber to create depth of voice, the curves on the phones will affect how audio signal is reflect, and the thickness of the plastic used is critical to the final audio quality achieved.

Handset design is a trade-off between the rules of audio and the aesthetic vision of the designer. It is this seeking for high quality audio combined with pleasing aesthetic design that forces IP Phone developers to improve and come up with new solutions that take them beyond today’s knowledge on achieving high quality audio.

Selecting the CODECs

The word codec is a shortening of ‘compressor-decompressor’ or, more commonly, ‘coder-decoder’. A codec encodes a data stream or signal for transmission, storage or encryption, or decodes it for playback or editing. As with conventional telephony, with VoIP the speech is initially captured in analogue form with a microphone. This analogue information is then transferred into a digital format by a converter and changed through codecs into corresponding audio-binary formats.

In order for the data to be converted correctly back into speech after being transported, the receiver must use the same codec as the sender. Depending on the codec used, the data can be compressed to differing extents in this process. Most codecs use a procedure through which information not important for the human ear is omitted. This reduces the amount of data and thus reduces the bandwidth required for transfer. However, if too much information is omitted, the speech quality will suffer.

Different codec procedures handle the audio compression with different levels of efficiency. Some are specifically designed to achieve a low bandwidth at any cost. Depending on the codec, therefore, the bandwidth needed and the speech quality will vary. The design skills of the IP Phone manufacturer in the management of codecs creates a clear differentiation between vendors.

Refining Voice Quality

Methods such as jitter buffers, echo suppression, echo cancellation and packet loss concealment can be used in IP handset design to improve voice quality.

Echo suppressors work by detecting a voice signal going in one direction on a circuit, and then inserting loss in the other direction. This added loss prevents the speaker from hearing his own voice. Echo cancellation is based on recognizing the originally transmitted signal that re-appears, with some delay, in the transmitted or received signal. Once the echo is recognized, it can be removed by subtracting it from the transmitted or received signal.

When silence suppression is on, comfort noise needs to be generated locally by the IP Handset at the other end of the call so that the other party will not mistakenly believe that the call has been terminated. By preventing echo from being created or removing echo if it is already present voice quality is improved, at Snom we call this Automatic Noise Reduction.

IP Phones echo controls are implemented digitally using a digital signal processor (DSP) or software and at Snom we implement to the ITU requirements. Digital signal processing is the mathematical manipulation of the information signal to modify or improve it. DSP is not one size fits all. Different DSP coefficient pre-sets are needed for different room types. Refining the voice using these techniques will improve the subjective quality, as an additional benefit the process also increases the effective use of bandwidth as silence suppression prevents echo from traveling across the voice network.

Transmitting high quality voice over IP is made more difficult due to packet loss and jitter. A technique used to reduce jitter involves buffering audio packets at the receiving handset, so that slower packets arrive in time to be played out in the correct sequence at the appropriate times. The objective of jitter buffering is to keep the packet loss rate low and so improve the voice quality. A fixed method, which uses a fixed buffer size, is easier to implement than an adaptive method, but will result in less satisfactory audio quality because there is no optimal delay when network conditions vary with time.

Snom handsets support adaptive jitter buffers which although more complex and expensive to implement perform continuous estimation of the network delays and dynamically adjust the playout delay at the beginning of each transmission so ensuring a high quality of voice.

Packet loss concealment (PLC) is a technique to mask the effects of packet loss in VoIP communications. Because the voice signal is sent as packets on a VoIP network, they may travel different routes to get to destination. At the receiver a packet might arrive very late, corrupted or simply might not arrive. This could happen where a packet is rejected by a server which has a full buffer and cannot accept any more data. In a VoIP connection, the receiver should be able to cope with packet loss.

All these voice techniques enhance and improve voice quality, and are quantifiable and measurable components of high quality IP Phone design and should be viewed as absolute requirements in professional and enterprise handsets.

Testing the Voice Quality

Testing voice quality on a new product should begin as soon as a first injection of plastic is produced and continue throughout the life cycle of the product.  In Snom we believe in the value of doing our testing house and have made a considerable investment in German engineered state of the art Audio equipment that will simulate not only the voice from the phone handset and speaker phone and in relationship to the human head, but also test for voice quality under different conditions such as with background noise from a busy office or factory and in a variety of network conditions. Ongoing testing ensure the quality of voice provided by VoIP phones and VoIP accessories and end points including wired and wireless headsets, speakerphones and conference audio-devices.

Accurate and effective audio measurements require time, preparation and patience. Snom’s testing in done in our Head office in Berlin using our state-of-the-art audio quality measurement equipment and anechoic chamber facility. Leading measurement technology combined with the know-how and experience of the Snom audio quality team enables comprehensive subjective and objective testing to determine audio quality parameters to maximize VoIP device potential. The measurement system uses the IP phone specifications .published in the latest ETSI and TIA releases.