The U.S. Patent and Trademark Office published a Microsoft patent application that reaches back to December 2009 and describes “recording agents” to legally intercept VoIP phone calls.
The “Legal Intercept” patent application is one of Microsoft’s more elaborate and detailed patent papers, which is comprehensive enough to make you think twice about the use of VoIP audio and video communications. The document provides Microsoft’s idea about the nature, positioning and feature set of recording agents that silently record the communication between two or more parties.
The patent was filed well before Microsoft’s acquisition of Skype and there is no reason to believe that the patent was filed with Skype as a Microsoft property in mind. However, the patent mentions Skype explicitly as an example application for this technology and Microsoft may now have to answer questions in which way this patent applies to its new Skype entity and if the technology will become part of Skype.
In the patent descriptions, the company justifies such a feature with the fact that monitoring of calls has been around for a long time for traditional calls, but devices that were used for plain old telephone service (POTS) simply do not work with VoIP anymore. Recording agents are designed to take the place of those outdated devices, but are – not surprisingly – much more capable, can be placed in different locations and automate call interceptions. For example, Microsoft says that recording will be triggered by “events”, or a “sequence of events” – for example when specific callers are involved.
The patent does not mention an eavesdropping module that is integrated into the client software. However, it describes recording agents that can be placed in a multitude of devices, including routers (see image, RA = recording agent). There is also the note of a recording agent software that represents “a software module that logically and/or physically sits between the call server and the network.” According to Microsoft, the agent will have access “to each communication sent to and from the call server,” which clearly refers to the general infrastructure of a VoIP service and network.
The patent lists the following process of a silently recorded call (we removed references to drawings in the description for easier reading):
1. A delivery endpoint is registered with a call server. For example, the intercept requestor may register an IP address/port for delivery of copies of recorded communications associated with a designated VoIP entity.
2. A request to monitor a selected VoIP entity is sent by the requestor to the call server. For example, the intercept requestor may request that the call server record communications for the VoIP entity.
3. An initiating entity negotiates candidate network paths with a media relay. For example, the VoIP entity may talk to a STUN, TURN, and/or other servers to determine what IP address/port of the VoIP entity is visible from the network. For example, if the VoIP entity is connected to a NAT, the NAT may translate IP addresses and port numbers. In STUN/TURN environments, the call gateway may act as a STUN and/or TURN server. The SDP parameters indicated previously are an example of what may result as the entity negotiates candidate communication points with a media relay.
4. The initiating entity sends an invite to the call server. The invite includes data regarding establishing a communication session between at least two entities via a switched packet network for a communication that includes audio. For example, the VoIP entity sends an invite (such as the SDP parameters mentioned previously) to the call server to communicate with a VoIP entity in the enterprise.
5. A copy of the invite is sent to the delivery point. For example, the call server may send a copy of the invite to the intercept requestor or another endpoint designated by the intercept requestor.
6. An invite with no local candidates is sent to the remote entity. For example, the call server sends an SDP with the local candidates deleted to the remote entity of the enterprise . Having no local candidates is synonymous with having “no direct paths.” In STUN/TURN terminology, this means that the VoIP entity needs to employ a TURN server to communicate with the remote entity.
7. The remote entity responds to the invite by sending “OK.” For example, the remote entity in the enterprise responds to the invite by sending an OK to the call server.
8. A copy of the OK is sent to the delivery point. For example, the call server sends a copy of the OK to the intercept requestor or another endpoint designated by the intercept requestor.
9. The OK is sent to the initiating entity. For example, the call server sends the OK to the VoIP entity.
10. The agent that will be recording the subsequent communication between the entities is configured so that it will create a copy of the communication. For example, the call server, the call gateway, or some other server may configure the router to create a copy of the communication to and from the VoIP entity. Note, that the recorded may be configured to record a communication for an entity any time after a monitoring request for the entity is received.
11. The VoIP entity sends a packet to the media relay. For example, the VoIP entity may send a packet to the call gateway.
12. The packet passes to the recorder. For example, the packet may pass to the router.
13. The packet is sent to the remote entity. In addition, a copy of the packet is sent to the delivery point and/or stored for later sending to the delivery point or retrieval by a law enforcement agent. For example, the router sends the packet to the VoIP entity in the enterprise and sends a copy of the packet to the intercept requestor or another endpoint designated by the intercept requestor. This continues until the communication is terminated.
14. Upon termination, the delivery endpoint may be informed that the communication has terminated.
The patent clearly addresses the need of governments and law enforcement to record Internet calls. There is also a certain sense that especially closed networks are targeted with this technology, yet the clear notion that VoIP applications targeted by this patent “may include audio messages transmitted via gaming systems, instant messaging protocols that transmit audio, Skype and Skype-like applications, meeting software, video conferencing software, and the like” may raise privacy concerns and surely the question of how Microsoft intends to use such a patent now that it owns Skype.
So, Microsoft: Will Skype officially include eavesdropping capability in the future?
A request for clarification we sent to Microsoft has remained unanswered so far.
Update: There have been some questions what Microsoft may gain from such a technology and why it is doing this on its own, without a government request. Some may remember that the U.S. government has actually asked for such a capability a while ago and is offering a boatload of money for this feature – the Register was talking about “billions.” The cost for Skype may not have been so high after all.