TCP/IP Tutorial and Technical Overview

Table of Contents References
TCP/IP Tutorial and Technical Overview

2.3 Internet Protocol (IP)

Figure: Internet Protocol (IP)

IP is a standard protocol with STD number 5 which also includes ICMP (see Internet Control Message Protocol (ICMP)) and IGMP (see Internet Group Management Protocol (IGMP)). Its status is required.

The current IP specification can be found in RFCs 791, 950, 919 and 922, with updates in RFC 1349.

IP is the protocol that hides the underlying physical network by creating a virtual network view. It is an unreliable, best-effort connectionless packet delivery protocol.

It adds no reliability, flow control or error recovery to the underlying network interface protocol. Packets (datagrams) sent by IP may be lost, out of order, or even duplicated, and IP will not handle these situations. It is up to higher layers to provide these facilities.

IP also assumes little from the underlying network mechanisms, only that the datagrams will ``probably'' (best-effort) be transported to the addressed host.

2.3.1 IP Datagram

The Internet datagram (IP datagram) is the base transfer packet in the Internet protocol suite. It has a header containing information for IP, and data that is relevant only to the higher level protocols.

Figure: Base IP Datagram

The IP datagram is encapsulated in the underlying network's frame, which usually has a maximum length or frame limitation, depending on the hardware used. For Ethernet, this will typically be 1500 bytes. Instead of limiting the IP datagram length to some maximum size, IP can deal with fragmentation and re-assembly of its datagrams. In particular, the IP standard does not impose a maximum size, but states that all subnetworks should be able to handle datagrams of at least 576 bytes.

Fragments of a datagram all have a header, basically copied from the original datagram, and data following it. They are treated as normal IP datagrams while being transported to their destination. Note, however, that if one of the fragments gets lost, the complete datagram is considered lost since IP does not provide any acknowledgment mechanism, so the remaining fragments will simply be discarded by the destination host.

2.3.1.1 IP Datagram Format

The IP datagram header is a minimum of 20 bytes long:

Figure: IP Datagram Format

Where:

VERS

The version of the IP protocol. The current version is 4. 5 is experimental and 6 is IPng (see IP: The Next Generation (IPng)).

LEN

The length of the IP header counted in 32-bit quantities. This does not include the data field.

Type of Service

The type of service is an indication of the quality of service requested for this IP datagram.

Where:

Precedence

Is a measure of the nature and priority of this datagram:

000: Routine
001: Priority
010: Immediate
011: Flash
100: Flash override
101: Critical
110: Internetwork control
111: Network control

TOS

Specifies the type of service value:

1000: Minimize delay
0100: Maximize throughput
0010: Maximize reliability
0001: Minimize monetary cost
0000: Normal service

MBZ

Reserved for future use ("must be zero" unless participating in an Internet protocol experiment which makes use of this bit)

A detailed description of the type of service can be found in the RFC 1349.

Total Length

The total length of the datagram, header and data, specified in bytes.

Identification

A unique number assigned by the sender to aid in reassembling a fragmented datagram. Fragments of a datagram will have the same identification number.

Flags

Various control flags:

Where:

0: Reserved, must be zero
DF: Don't Fragment: 0 means allow fragmentation, 1 means do not allow fragmentation.
MF: More Fragments: 0 means that this is the last fragment of this datagram, 1 means that this is not the last fragment.

Fragment Offset

Used with fragmented datagrams, to aid in reassembly of the full datagram. The value is the number of 64-bit pieces (header bytes are not counted) that are contained in earlier fragments. In the first (or only) fragment, this value is always zero.

Time to Live

Specifies the time (in seconds) this datagram is allowed to travel. Each router where this datagram passes is supposed to subtract from this field its processing time for this datagram. Actually a router is able to process a datagram in less than 1 second; thus it will subtract one from this field, and the TTL becomes a hop-count metric rather than a time metric. When the value reaches zero, it is assumed that this datagram has been traveling in a closed loop and it is discarded. The initial value should be set by the higher-level protocol which creates the datagram.

Protocol Number spotipprotn>

Indicates the higher-level protocol to which IP should deliver the data in this datagram. Some important values are:

0: Reserved
1: Internet Control Message Protocol (ICMP)
2: Internet Group Management Protocol (IGMP)
3: Gateway-to-Gateway Protocol (GGP)
4: IP (IP encapsulation)
5: Stream
6: Transmission Control (TCP)
8: Exterior Gateway Protocol (EGP)
9: Private Interior Routing Protocol
17: User Datagram (UDP)
89: Open Shortest Path First

The full list can be found in STD 2 - Assigned Internet Numbers.

Header Checksum

Is a checksum on the header only. It does not include the data. The checksum is calculated as the 16-bit one's complement of the one's complement sum of all 16-bit words in the header. For the purpose of this calculation, the checksum field is assumed to be zero. If the header checksum does not match the contents, the datagram is discarded because at least one bit in the header is corrupt, and the datagram may even have arrived at the wrong destination.

Source IP Address

The 32-bit IP address of the host sending this datagram.

Destination IP Address

The 32-bit IP address of the destination host for this datagram.

Options

Variable length. An IP implementation is not required to be capable of generating options in the datagrams it creates, but all IP implementations are required to be able to process datagrams containing options. The Options field is variable in length. There may be zero or more options. There are two option formats. The format for each is dependent on the value of the option number found in the first byte.

A type byte alone.
A type byte, a length byte and one or more option data bytes.

The type byte has the same structure in both cases:

Where:

fc

Flag copy indicates whether (1) or not (0) the option field is to be copied when the datagram is fragmented.

class

The option class is a 2-bit unsigned integer:

0: control
1: reserved
2: debugging and measurement
3: reserved

option number

The option number is a 5-bit unsigned integer.

0: End of option list. It has a class of 0, the fc bit is set to zero, and it has no length byte or data. That is, the option list is terminated by a X'00' byte. It is only required if the IP header length (which is a multiple of 4 bytes) does not match the actual length of the options.
1: No operation. It has a class of 0, the fc bit is not set and there is no length byte or data. That is, a X'01' byte is a NOP. It may be used to align fields in the datagram.
2: Security. It has a class of 0, the fc bit is set and there is a length byte with a value of 11 and 8 bytes of data). It is used for security information needed by US Department of Defense requirements.
3: Loose Source Routing. It has a class of 0, the fc bit is set and there is a variable length data field. This option is discussed in more detail below.
4: Internet Timestamp. It has a class of 2, the fc bit is not set and there is a variable length data field. The total length may be up to 40 bytes. This option is discussed in more detail below.
7: Record Route. It has a class of 0, the fc bit is not set and there is a variable length data field. This option is discussed in more detail below.
8: Stream ID. It has a class of 0, the fc bit is set and there is a length byte with a value of 4 and one data byte. It is used with the SATNET system.
9: Strict Source Routing. It has a class of 0, the fc bit is set and there is a variable length data field. This option is discussed in more detail below.

length

counts the length (in bytes) of the option, including the type and length fields.

option data

contains data relevant to the option.

padding

If an option is used, the datagram is padded with all-zero bytes up to the next 32-bit boundary.

data

The data contained in the datagram is passed to a higher-level protocol, as specified in the protocol field.

2.3.1.2 Fragmentation

When an IP datagram travels from one host to another, it can cross different physical networks. Physical networks have a maximum frame size, called the Maximum Transmission Unit (MTU), which limits the length of a datagram that can be placed in one physical frame. Therefore, a scheme has been put in place to fragment long IP datagrams into smaller ones, and to reassemble them at the destination host. IP requires that each link has an MTU of at least 68 bytes, so if any network provides a lower value than this, fragmentation and re-assembly must be implemented in the network interface layer in a way that is transparent to IP. 68 is the sum of the maximum IP header length of 60 bytes and the minimum possible length of data in a non-final fragment (8 bytes). IP implementations are not required to handle unfragmented datagrams larger than 576 bytes, but most implementations will handle larger values, typically slightly more than 8192 bytes or higher, and rarely less than 1500.

An unfragmented datagram has all-zero fragmentation information. That is, the more fragments flag bit is zero and the fragment offset is zero. When fragmentation is to be done, the following steps are performed:

The DF flag bit is checked to see if fragmentation is allowed. If the bit is set, the datagram will be discarded and an error will be returned to the originator using ICMP.
Based on the MTU value, the data field is split into two or more parts. All newly created data portions must have a length which is a multiple of 8 bytes, with the exception of the last data portion.
All data portions are placed in IP datagrams. The header of these datagrams are copies of the original one, with some modifications:
- The more fragments flag bit is set in all fragments except the last.
- The fragment offset field in each is set to the location this data portion occupied in the original datagram, relative to the beginning of the original unfragmented datagram. The offset is measured in 8-byte units.
- If options were included in the original datagram, the high order bit of the option type byte determines whether or not they will be copied to all fragment datagrams or just to the first one. For instance, source route options have to be copied in all fragments and therefore they have this bit set.
- The header length field is of the new datagram is set.
- The total length field of the new datagram is set.
- The header checksum field is re-calculated.
Each of these fragmented datagrams is now forwarded as a normal IP datagram. IP handles each fragment independently, that is, the fragments may traverse different routers to the intended destination, and they may be subject to further fragmentation if they pass through networks that have smaller MTUs.

At the destination host, the data has to be reassembled into one datagram. The identification field of the datagram was set by the sending host to a unique number (for the source host, within the limits imposed by the use of a 16-bit number). As fragmentation doesn't alter this field, incoming fragments at the receiving side can be identified, if this ID field is used together with the Source and Destination IP addresses in the datagram. The Protocol field is also be checked for this identification.

In order to reassemble the fragments, the receiving host allocates a buffer in storage as soon as the first fragment arrives. A timer routine is then started. When the timer timeouts and not all of the fragments have been received, the datagram is discarded. The initial value of this timer is called the IP datagram time-to-live (TTL) value. It is implementation dependent, and some implementations allow it to be configured; for example AIX Version 3.2 provides an ipfragttl option with a default value of 60 seconds.

When subsequent fragments of the datagram arrive, before the timer expires, the data is simply copied into the buffer storage, at the location indicated by the fragment offset field. As soon as all fragments have arrived, the complete original unfragmented datagram is restored, and processing continues, just as for unfragmented datagrams.

Note: IP does not provide the reassembly timer. It will treat each and every datagram, fragmented or not, the same way, that is, as individual messages. It is up to the higher layer to implement a timeout and to look after any missing fragments. The higher layer could be TCP for a connection-oriented transport network or the application for connectionless transport networks based upon UDP and IP.

The netstat command may be used on some TCP/IP hosts to list details of fragmentation that is occurring. An example of this is the netstat -i command in TCP/IP for OS/2.

2.3.1.3 IP Datagram Routing Options

The IP datagram Options field allows two methods for the originator of an IP datagram to explicitly provide routing information and one for an IP datagram to determine the route that it travels.

Loose Source Routing

The Loose Source Routing option, also called the Loose Source and Record Route (LSRR) option, provides a means for the source of an IP datagram to supply explicit routing information to be used by the routers in forwarding the datagram to the destination, and to record the route followed.

Figure: Loose Source Routing Option

1000011: (decimal 131) is the value of the option type byte for loose source routing.
length: contains the length of this option field, including the type and length fields.
pointer: points to the option data at the next IP address to be processed. It is counted relative to the beginning of the option, so its minimum value is four. If the pointer is greater than the length of the option, the end of the source route is reached and further routing is to be based on the destination IP address (as for datagrams without this option).
route data: is a series of 32-bit IP addresses.

Whenever a datagram arrives at its destination and the source route is not empty (pointer < length) the receiving host will:

Take the next IP address in the route data field (the one indicated by the pointer field) and put it in the Destination IP address field of the datagram.
Put the local IP address in the source list at the location pointed to by the pointer field. The IP address for this is the local IP address corresponding to the network on which the datagram will be forwarded (routers are attached to multiple physical networks and thus have multiple IP addresses).
Increment pointer by 4.
Transmit the datagram to the new destination IP address.

This procedure ensures that the return route is recorded in the route data (in reverse order) so that the final recipient uses this data to construct a loose source route in the reverse direction. This is a loose source route because the forwarding router is allowed to use any route and any number of intermediate routers to reach the next address in the route.

Note: The originating host puts the IP address of the first intermediate router in the destination address field and the IP addresses of the remaining routers in the path, including the target destination are placed in the source route option. The recorded route in the datagram when it arrives at the target contains the IP addresses of each of the routers that forwarded the datagram. Each router has moved one place in the source route, and normally a different IP address will be used, since the routers record the IP address of the outbound interface but the source route originally contained the IP address of the inbound interface.

Strict Source Routing

The Strict Source Routing option, also called the Strict Source and Record Route (SSRR) option, uses the same principle as loose source routing except that the intermediate router must send the datagram to the next IP address in the source route via a directly connected network and not via an intermediate router. If it cannot do so it reports an error with an ICMP Destination Unreachable message.

Figure: Strict Source Routing Option

1001001: (decimal 137) is the value of the option type byte for strict source routing
length: has the same meaning as for loose source routing
pointer: has the same meaning as for loose source routing
route data: is a series of 32-bit IP addresses

Record Route

This option provides a means to record the route of an IP datagram. It functions similarly to the source routing discussed above, but this time the source host has provided an empty routing data field, which will be filled in as the datagram traverses routers. Note that sufficient space for this routing information must be provided by the source host: if the data field is filled before the datagram reaches its destination, the datagram is forwarded with no further recording of the route.

Figure: Record Route Option

0000111: (decimal 7) is the value of the option type byte for record route
length: has the same meaning as for loose source routing
pointer: has the same meaning as for loose source routing
route data: is a multiple of four bytes in length chosen by the originator of the datagram

2.3.1.4 Internet Timestamp

A timestamp is an option forcing some (or all) of the routers on the route to the destination to put a timestamp in the option data. The timestamps are measured in seconds and can be used for debugging purposes. They cannot be used for performance measurement for two reasons:

They are insufficiently precise because most IP datagrams will be forwarded in less than one second.
They are insufficiently accurate because IP routers are not required to have synchronized clocks.

Figure: Internet Timestamp Option

Where

01000100

(Decimal 68) is the value of the option type for the internet time stamp option.

length

Contains the total length of this option, including the type and length-fields.

pointer

Points to the next timestamp to be processed (first free timestamp).

oflw (overflow)

Is a 4 bit unsigned integer of the number of IP modules that cannot register timestamps due to a lack of space in the data field.

flag

Is a 4-bit value which indicates how timestamps are to be registered. Values are:

0: Timestamps only, stored in consecutive 32-bit words.
1: Each timestamp is preceded by the IP address of the registering module.
2: The IP address fields are pre-specified, and an IP module only registers when it finds its own address in the list.

timestamp

A 32-bit timestamp recorded in milliseconds since midnight UT (GMT).

The originating host must compose this option with a large enough data area to hold all the timestamps. If the timestamp area becomes full, no further timestamps are added.

2.3.2 IP Routing

An important function of the IP layer is IP routing. It provides the basic mechanism for routers to interconnect different physical networks. This means that an internet host can function as a normal host and a router simultaneously.

A basic router of this type is referred to as a router with partial routing information, because the router only has information about four kinds of destination:

Hosts which are directly attached to one of the physical networks to which the router is attached
Hosts or networks for which the router has been given explicit definitions
Hosts or networks for which the router has received an ICMP redirect message
A default destination for everything else

The last two items allow a basic router to begin with a very limited amount of information and to increase its information because a more sophisticated router will issue an ICMP redirect message if it receives a datagram and it knows of a better router on the same network for the sender to use. This process is repeated each time a basic router of this type is restarted.

Additional protocols are needed to implement a full-function router that can exchange information with other routers in remote network. Such routers are essential except in small networks, and the protocols they use are discussed in Routing Protocols.

2.3.2.1 Direct and Indirect Destinations

If the destination host is attached to a network to which the source host is also attached, an IP datagram can be sent directly, simply by encapsulating the IP datagram in the physical network frame. This is called direct routing.

Indirect routing occurs when the destination host is not on a network directly attached to the source host. The only way to reach the destination is via one or more routers. The address of the first of these routers (the first hop) is called an indirect route. The first hop address is the only information needed by the source host: the router which receives a datagram has responsibility for the second hop and so on.

Figure: Direct and Indirect IP Routes - Host A has a direct route to hosts B and D, and an indirect route to host C. Host D is a router between the 129.1 and 129.2 networks.

A host can tell whether a route is direct or indirect by examining the network number and subnet number parts of the IP address.

If they match one of the IP addresses of the source host, the route is a direct one.
The host needs to be able to address the target correctly using a lower-level protocol than ARP. This can either be done automatically using a network-specific protocol, such as ARP (see Address Resolution Protocol (ARP)), which is used on broadcast LANs, or by statically configuring the host, for example when an MVS host has a TCP/IP connection over an SNA link.
For ``indirect'' routes, the only knowledge required is the IP address of a router leading to the destination network.

IP implementations may also support explicit host routes, that is, a route to a specific IP address. This is common for dial-up connections using Serial Line Internet Protocol (SLIP) which does not provide a mechanism for two hosts to inform each other of their IP addresses. Such routes may even have the same network number as the host, for example on subnets composed of point-to-point links. In general, however, routing information is done by network number and subnet number only.

2.3.2.2 IP Routing Table

Each host keeps the set of mappings between destination IP addresses and the IP addresses of the next hop routers for those destinations in a table called the IP routing table.

Three types of mappings can be found in this table:

Direct routes, for locally attached networks
Indirect routes, for networks reachable via one or more routers
A default route, which contains the IP address of a router to be used for all IP addresses which are not covered by the direct and indirect routes.

See the network in Figure - Example IP Routing Table for an example configuration.

Figure: Example IP Routing Table

The routing table of host D will contain the following entries

Destination: route via
128.10: direct attachment
128.15: direct attachment
129.7: 128.15.1.2
default: 128.10.1.1

2.3.2.3 IP Routing Algorithm

From the foregoing discussion, one can easily derive the steps that IP must take in order to determine the route for an outgoing datagram. This is called the IP routing algorithm and it is shown schematically in Figure - IP Routing Algorithm.

Figure: IP Routing Algorithm

Note that this is an iterative process. It is applied by every host handling a datagram, except for the host to which the datagram is finally delivered.

Table of Contents Internet Control Message Protocol (ICMP)