Nat Traversal fundamentals

NAT history

While internet pioneers in early days of internet (before mid. 1990) designed network architecture and protocols still used today they had in mind network should be able to provide end-to-end connectivity between any to host connected to internet as one of primary uses.

Drastic expansion of network resulted public IP address become valuable resource. End-to-end connectivity also had security issue because you practically directly expose your computer/device to potential abuses. Because of this reasons in mid. 1990 NAT (network address translation) devices become popular and today there presence is common in our houses and offices. 

NAT devices (routers) enable multiple devices in local network to share single IP making internet easily distributable to any computer/device in home or office network.

NAT also provides basic shield to local network form possible outside attacks, because they don't let any traffic form outside reach computer/device on local network unless device/computer in local network initiates connection by sending request to remote service. NAT keep records of requests and lets back data from remote service only if local client has initiated data transfer. Routers commonly also have built in firewall that further increases security protection by router device.

This is great, but what about End-To-End connectivity. Original concepts had flaws to use it without consequences but no dough we need it and it's irreplaceable for some sorts of communication.      

NAT device controllable port mapping by clients

Utilities to enable End-To-End connectivity where added to NAT (router) devices. Protocols that enabled client applications to request port mapping from router such that all data targeting mapped port reaches client that requested mapping. This protocols are:

  UPnP  - most common, port mapping creation using universal plug and play protocol (based on XML messages) 

  NAT PMP (Nat address translation port mapping protocol) - old protocol you would really found in your router. It had intensive use in some large AIR WAY companies.  

  PCP (Port control protocol) - proposed by Apple latest protocol to be adopted as RFC standard. It is compatible with NAT PMP protocol. You can commonly find it in Apple's NAT devices (newer).

Presently (2013) you will most likely find UPnP in your router settings, unfortunately disabled by default because if enabled can be abused by viruses and trojans if there are already present in your computer. It is expected that you enable one of this only if you need it , like in situation when you want to play some online multiplayer game.

Also this solutions have sense only if your router has public IP which is becoming rare this days because internet providers tend to share public IP between several users. Also it's not rare thy use devices and routing software that can full your WAN device that its on public IP.

So if you design some software that only use this methods to create direct connections you will have maybe in best case 10% users that can use it , that if you also explicitly tell them they should enable particular protocol. This stands for all devices accessing internet generally. If your users are people behind home routers that play some multiplayer online game over computer, most likely thy will have UPnP enabled because at least some friend will help them configure it so maybe you will have 20% usability in this case.

Traversal using Intended NAT Table manipulation

So we see port mapping protocols can be used only in small number of cases. What can we do now? We can cheat our NAT device to create mapping by sending some packet to remote client then instructing remote client to send packet that looks like response (matched source address and port) to us. If all ok and record in NAT mapping table is matched (like we are relay lucky then) packet form remote client will reach us. This would be explanation of oldest known technique of NAT traversal using intended NAT table manipulation referred as "UDP Hole punching". Earlier this technique was really usable and had great success rate. TCP connection could be even created after using same ports and even TCP hole punch was fairly successful.

UDP Hole punching become even more usable when STUN technique was invented (Cornel university). STUN is used to learn if computer is behind NAT , NAT behavior and ports that router with public IP mapped as external (Every NAT device can change packet source port, change is recorded in table so it knows how to modify response packets source port). They recognized  four observable classifications of NAT behavior:

  1. A full cone NATis one where all requests from the same internal IP address and port are
    mapped to the same external IP address and port. Furthermore, any external host can send

    a packet to the internal host, by sending a packet to the mapped external address.
  2. A restricted cone NAT is one where all requests from the same internal IP address and
    port are mapped to the same external IP address and port. Unlike a full cone NAT, an external
    host (with IP address X) can send a packet to the internal host only if the internal host

    had previously sent a packet to IP address X.
  3. A port restricted cone NAT is like a restricted cone NAT, but the restriction
    includes port numbers. Specifically, an external host can send a packet, with source IP
    address X and source port P, to the internal host only if the internal host had previously

    sent a packet to IP address X and port P.
  4. A symmetric NAT is one where all requests from the same internal IP address and port,
    to a specific destination IP address and port, are mapped to the same external IP address and
    port. If the same host sends a packet with the same source address and port, but to

    a different destination, a different mapping is used. Furthermore, only the external host that
    receives a packet can send a UDP packet back to the internal host.

Since router (NAT) design in not standardized in this terms it quickly became clear that this classification is not valid, because it can lead us wrong way. Probably classification was "more valid" in time when invited because it's based on empiric conclusions, but eventually due new NAT designs become outdated. Commonly when you use some STUN testing client on your computer behind NAT to test it you can get 4 different results for NAT classification of router. So this simply cannot be taken as valid information. What you can use is fact that if you get any of above four results you can be sure there is NAT present . Mapped ports you get from responses are also usable because they will tell you most probable area of value next mapping will take . (Note that even if there is NAT sometimes very rarely STUN query may tell you that you are behind open internet - public IP)

Why did this techniques become outdated? Unfortunately security administrator and us IT engineers developing NAT traversal software are in constant struggle. We are basically both right and wrong. They claim NAT traversal is used only by crackers and pirates and we claim NAT traversal is simply sometimes needed and security level does not degrade because NAT traversal finds some random port for communication and does application specific data transfer. So even if abuser manages to guess one of 65536 ports his data will get into some application process that will throw exception because of false data or simply break. So speaking about security while transferring data using direct cannels is far far far ... more secure  than using intermediate server. Communicating with servers is less secure than communication with some host directly because servers are well known and they are subject of crackers attacks. Also client-server communication is commonly based on well known protocols so that is also suitable for injecting entity of arbitrary code. Besides all that you can never be sure someone from cloud hosting company does not pick at your data.

To return to our story , later NAT devices and networks are not so thankful for NAT traversal because some engineers design routers probably recognized it as security threat. TCP traversal is almost impossible unless you have ability to use raw sockets which is unpractical because most OS-s enforce high security rules for their use or even just don't support it. TCP uses 3 step handshake involving packet number, session number ,packet type.... and in most cases you need to mach all of them to trick your NAT device not speaking of possibility that ICMP packet of type "Destination Pot Unreachable" resets your try.  Basic UDP hole punch will work in small number of cases. Usably if you have router form some quality company like Cisco (Linksys) or NETGEAR chances of success are grater because engineers that design their devices are better and they probably recognize need so they will design their devices properly. For example STUN and basic UDP hole punch will be enough to traversal Linksys router NAT.  Linksys will preserve source original port if possible or will take some near-by value so NAT traversal on such quality router is fairly easy.

Modern day method of NAT traversal by intended NAT table manipulation should involve next external ports prediction, price packets TTL manipulation which is key factor in cases of symmetric and port restricted cone NAT, multiple retries and side swap. As we already said NAT behavior is not standardized so designing good "piercing" method involves lot of testing on different NAT consultations between peers so good routine could be designed based on empiric conclusions. 

Final NAT traversal solution

Industrial standard NAT traversal solution should be able to apply all possible methods mentioned in above texts. Is should inspect network environment of both peers and decide which meted of NAT traversal should be applied.  If one method fails it should be able to try other methods or repeating swapping peer sides. If nether methods of direct tunnel creation succeeds relay will be last solution that we know for sure must work because it is based on standard client-server model. Relay is most expensive resource is peer-to-peer system network so having better success with direct tunnel methods will make system more flexible and cheap to maintain. Simple calculation to demonstrate this:

Let's say we have one server having connection bandwidth 100Mb/s. We want to support our Video-Over-IP application which requires let's say 500KB/s per peer pair = 4000Kb = 4Mb.

We want to have quality service guarantying  500KB/s for each peer-to-peer session under any conditions.

If we don't use NAT Traversal we will be able to support 100/4 = 20 sessions at once per server.

If we use NAT traversal and percentage of all tunnels made by relay is 5% we will be able to support  (100/4) + 95%/5% * (100/4) = 400 sessions at once per server.

Also if we get users over the proposed limit quality of our service will degrade in 20 times slower rate on NAT traversal equipped system.

So cost of system equipped with NAT traversal would be about 20 times less than pure relay system.

This is just one advantage of NAT traversal equipped peer-to-peer systems. 

Direct Peer-To-Peer VS Cloud

Peer-to-peer and cloud systems (virtualization) are by thier nature totaly different things but in quite few cases you may use eather to achive same thing. In this cases you may wonder what to chose for your implementation so we will talk abou key differences, advantages and disadvantages in following text.

Virtualization (most common use of cloud computing) abstracts the physical infrastructure, which is the most rigid component, and makes it available as a soft component that is easy to use and manage. In refferal to peer to peer systems we will focus just on uses of clouding sutable for comparation.

Most common thing you could do every day over peer-to-peer and cloud would be file transfer. There are quite few services offering this use like dropbox, google drive.... 

You transfer files by uploading them first form clent A to intermidiate cloud server S, then this files are availabe for download form clinets B, C, D... until you explicitly delete them from S. Key advantages of this implementation are that clients A,B,C,D... are not required to be active in same time, cloud strage practically serves as shared network disk and you can downlaod that same files wherever and whenever you want if you have internet connection.

But unfortunately ther are many down-sides. Most important are security and privacy of you data. Anyone, anytime can access your data if he has your username password. Also fact that some emplaye in cloud hosting company will not peek at your data can not be guaranteed in other way that by company promise which is also not relevent refiring to particular employees. Also we could mention organised government surveillance programs that coud bother you much is you are some other government protecting suvernity of your country. Good information is mightiest weapon this days. Also imagine you hold 10.000.000,00$ expencive softwave source code on such server, or you are public figure storing there some media material that could compromise you if exposed - that would not be recomended for sure.

Also intermidate servers storing this cloud data are known places exposed to attacks. If someone wants to observe you data he will simply know right place to look for it becuse it's concentrated in one single place so in the end you might finsh with apologize from cloud hosting company.

Cloud file storage systems often limit your free storage space becuse at the end they need physical disks space to store your data on thier servers and taht costs money. This is easaly overcomed by subscibing to some payed plan that helps them cover storage expences.

One thing in which cloud system could never compare to peer to peer systems are real-time communications. Since cloud if far easier to implement than peer-to-peer systems there where some attempts to implement cloud streaming but such systems result in poor performance and enormus cost. Simply intermediate server becomes hot spot all clients comunicate with so total bandwithd is shared between all clients. Peer to peer systems overcome this by simply skiping intermediate hot spots. They simply comunicate directly so that practicaly has no impact to servers.

Encrypted peer-to-peer communication tunnel (direct tunnel created using nat traversal) is most secure and private way of transfering data between two hosts. This are some of the facts that earn it that:

- Tunnel is stelath to monitoring/observation/survailance systems becouse it happens on one (destination port) of 65535 ports that is randomly chosen during traversal operation and it's existance is very hard to be recognised. Monitoring/observation/survailance systems usualy track some well known ports you use evry day for common client-server communication like 80, 443, 25, 22, 23 , 995 ... where they also expect certain data transfer protocol based on port value.  

- Secure encryption keys generated in short priod are totaly secret to 3rd party. With clouding your keys may be half-exposed becuse you can not be sure attecker is monitoring server and that he is not avare about one part of key. 

 In above text we focused on most important diferences. That fact are important to note if you are designing some system that is required to provide hight standards for data security and privacy or quailty real-time communication between large number of peers. Usualy colud storage may be handy and fast solution for some evry day small scale solutions that serve small number of people. 

In some cases you even may combine cloud virtualization with peer-to-peer system to get best result.

+ Most important thing cloud virtualization gives you is alwaus accessible data

+ Most important thing peer-to-peer sistem gives you is secure and totaly private real time communication  

 

Direct Peer-To-Peer vs WebRTC

Lately technology called WebRTC become popular, mostly because it can be used form JavaScript and it's easy to implement.

WebRTC works like this:

- Client/Host opens and maintains session with website equipped with WebRTC service. If he wants to communicate with other peer then that other peer also needs to have active session with same site so webRTC service could create data-bridge between them. Also its possible that multiple servers work together. In taht case peers could connect with any site that is part of webRTC network Then if two peers want to communicate and they are not served by same server, servers will create server-to-server data channel to carry this peer-to-peer communication. Basically it could be compared with direct peer-to-peer systems that are set to always use just relay technique, and relaying is one thing direct peer-to-peer systems avoid for all cost because its most expensive system resource.

Because webRTC is easy to implement and it's available and friendly for people that are involved just with web development(majority of developers) webRTC become fairy popular. It enables simple way of having peer-to-peer capabilities with web page using just JavaScript ajax.

WebRTC is set of techniques that were present long time ago, packed for use by web developers. WebRTC does not bring much technical advantage. Data passes servers which is insecure regarding privacy concerns. Number of active users is limited by total servers permeability. You need enormous investments in system powered by technology like this, all data must pass thru servers. Also it's TCP based which automatically limits thing to 100 socket connections (peers) per network adapter on server before degradation begins. Usually its good for things like web page chat or small file transfers but if you try to make something more serious that requires more intense data channel you will find your self trapped.

So conclusion is that webRTC is just simple technology intended for easy use by web developers working on projects intended for some small scale use.

Nat Traversal/Peer-To-Peer system

In this text we will focus on key elements one general purpose peer-to-peer system must have.
Main thing about peer-to-peer world is certainly communication channel between two hosts (peers), but to get to that point there must be some way those two can find each other.

Or in some cases peer X may be interesting to peer Y for connection establishment only if he can provide certain relevance to peer Y so there also must be option to publish some meta data about peer that others can lookup.

Peer X may want to refuse connection to peer Y for some reason so there also must be some sort of negotiation before tunnel is made.

Also Peer X and Peer Y may want to communicate in totally secure and private manner so data encryption and secure key exchange may come very handy in those cases.
So we need to provide these required abstract functionalities:

- NAT traversal for direct tunnel creation. Also is some small number of cases (~ 2 - 3%) nat traversal may fail so there must be relay service that will handle situations like this. Relay is most expensive part of system so designing good NAT traversal routing is key factor of quality peer-to-peer system.

- Instant messaging for negotiation and other control or short data messages

- Peer lookup by unique identification and/or published metadata

- Peer status notification for all other peers in relation  

Add-on functionalities most app will find use:

- Secure key exchange and data encryption

- Virtual user/networks/membership service that is closely aware of states of peers

  This system should give ability to permanently store metadata abut users, networks and other need abstract objects. This meta-data should be searchable and editable.
So let us now think what services we would need to provide to be able to support all this features.

- We need some service to which peers will report their presence and status. This service must be able provide peer with all necessary information it needs, so this service must be equipped with instant messaging system that is able to instantly notify peer about some changes on network relevant to him or to carry instant messages form one peer and deliver them instantly to other peer. Since this is place where information about active peers is available this system also should provide peer lookup by unique identifier or some searchable metadata. We will call this service "CHECKPOINT" in further text.
Technically communication between peer and CHECKPOINT should be UDP based and here is why:

- CHECKPOINT is expected to receive and send massive number of short messages to/from peers

- Each peer must be always accessible and able to receive notification, so with UDP this is easily achieved. When peer sends packet first time to CHECKPOINT thru his NAT device, CHECKPOINT may have around XXs lasting permission from NAT device to send some message back. If both peer and CHECKPOINT would be inactive for more than X sec NAT would close gate and notification messages arriving from CHECKPOINT would be thrown as unsolicited. So peer should send keep-alive each ~ X/2 sec in order of maintaining NAT table so CHECKPOINT can send message to peer anytime. Achieving this using TCP would be much less convenient and would make unnecessary data transfer. Also servers usually have limitation on number of concurrent TCP connections so that would also produce much faster performance degrade.   


- When two peers decide to establish direct communication channel NAT traversal operation should be invoked. NAT traversal is complex operation involving multiple steps where each new step depends on result of previous ones. So we introduce new service we will call "HANDSHAKE" in further texts. Handshake purpose is to synchronies NAT traversal operation steps between two peers. When they want to create tunnel CHECKPOINT will direct them to one of all available HANDSHAKE services to manage Nat traversal operation. HANDSHAKE procedure uses STUN technology to decide which methods are best to be applied in order of tunnel opening. So we need pair of STUN services to be available for HS. In some rear situations 2%, especially if one peer if behind some tight security corporate network tunnel opening will fail using NAT traversal in such cases we can turn to relaying that will always work. Relay is most expensive resource in system so it's crucial that HS does it procedure well so we minimize rely usage. To be able to guarantee 100% connectivity relay service must be part of system so it could handle 2% of tunnels Nat traversal was unable to create. Common standardized technology that was intended for relaying was TURN and you will find it in most or readings on internet and as RFC standard. Systems like quickP2P does not use TURN instead it uses raw relay mostly because resulting object of operation is common socket you will use as using socket in any other case also some networks deliberately detect and refuse TURN packets. TURN brings much overhead because of packet info data that is some times larger that actual "real" data so all for this reasons raw relay was picked by quickP2P engineers to handle 2% of tunnels that Nat traversal was unable to create.

So until now we described all components that would be needed for basic peer-to-peer NAT-traversal system : CHECKPOINT, HANDSAKE, STUN and RELAY service. Commonly every modern application requires storage of some permanent data like user profiles, user groups with thier metadata, device data etc... So if you would wnat to develop some application that has user accounts and does some sort of data exchange between users you would certanly need this. You could provide web services of your own on other hand imagine you have ability to store metadata permanetly and that system that does storage is more closly aware about availability of peers. That would certany be more convinient so we introduce INDEX servce. Index purphose is to do storage operations for permanent metadata that will be available no matter if peer is online or not.

Having all this we would be ready to create out-of-box peer-to-peer application with no need for additional web services.