13.2.2. Subnetwork masks
The second ifconfig in the boot process
installs proper masks and broadcast addresses if
subnetting is used to divide a larger IP address space. Default
subnetwork masks and broadcast addresses are assigned based on IP
address class, as
shown in Table 13-3.
Table 13-3. Default broadcast addresses
Address Class |
Network Address |
Network Mask |
Broadcast Address |
Class A |
x.0.0.0 |
255.0.0.0 |
x.255.255.255 |
Class B |
x.y.0.0 |
255.255.0.0 |
x.y.255.255 |
Class C |
x.y.z.0 |
255.255.255.0 |
x.y.z.255 |
The NIS netmasks map contains an
association of network numbers and
subnetwork masks and is used to override the default network masks
corresponding to each class of IP address. A simple example is the
division of a Class B network into Class C-like subnetworks, so that
each subnetwork number can be assigned to a distinct physical
network. To effect such a scheme, the netmasks
NIS map contains a single entry for the Class B address:
131.40.0.0 255.255.255.0
Broadcast addresses are derived from the network mask and host IP
address by performing a logical and of the two. Any bits that are
not masked out by the netmask become part of the
broadcast address, while those that are masked out are set to all
ones in Solaris (other systems may set them to all zeros).
Network numbers are matched based on the number of octets normally
used for an address of that class. IP address 131.40.52.28 has a
Class B network number, so the first two octets in the IP address are
used as an index into the netmasks map.
Similarly, IP address 89.4.1.3 is a Class A address; therefore, only
the first octet is used as a key into netmasks.
This scheme simplifies the management of
netmasks. By listing the network number to be
partitioned, you do not have to itemize all subnetworks in the
netmasks file.
Continuing the previous example, consider this
ifconfig:
ipnodes excerpt:
131.40.52.28 mahimahi
netmasks map:
131.40.0.0 255.255.255.0
ifconfig line:
ifconfig hme0 mahimahi netmask +
Resulting interface configuration:
% ifconfig hme0
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 131.40.52.28 netmask ffffff00 broadcast 131.40.52.255
Using a plus sign (+) as the netmask instead of an explicit network
mask forces the second ifconfig to read the NIS
netmasks map for the correct mask. The
four-octet mask is logically and-ed with the IP address, producing
the broadcast network number. In the preceding example, the broadcast
address is in the ones form. Note that the
network mask is actually displayed as a
hexadecimal mask value, and not as an IP address.
A more complex example involves dividing the Class C network 192.6.4
into four subnetworks. To get four subnetworks, we need an additional
two bits of network number, which are taken from the two most
significant bits of the host number. The netmask is therefore
extended into the next two bits, making it 26 bits instead of the
default 24-bit Class C netmask:
Partitioning requires:
24 bits of Class C network number
2 additional bits of subnetwork number
6 bits left for host number
Last octet has 2 bits of netmask, 6 of host number:
11000000 binary = 192 decimal
Resulting netmasks file entry:
192.6.4.0 255.255.255.192
Again, only one entry in netmasks is needed, and
the key for the entry matches the Class C network number that is
being divided.
You use variable length subnetting when using Classless IP
addressing. You specify how many bits of the IP address to use for
the network, and how many to use for the host by setting the
appropriate netmask entry. The format of the netmask entry is the
same as before, however, there should be an entry for each subnet
defined. ifconfig uses the longest possible
matching mask. Say your engineering organization has been given
control of the 131.40.86.0 network (addresses 131.40.86.0 ->
131.40.86.255). You decide to partition it into four separate
subnetworks that map the four groups in your organization: Systems
Engineering, Applications Engineering, Graphics Engineering, and
Customer Support. You plan to use a single system to serve as your
gateway between the four separate subnets and the enterprise network.
Your enterprise network address is 131.40.7.22, and is therefore
connected to the 131.40.7.0 enterprise network. In order to partition
the 131.40.86 address space into four separate subnets, you need to
use the two upper bits of the last octet to identify the network.
Table 13-4 shows the distribution of the IP
addresses to the different networks.
Table 13-4. Network assignment
Organization |
Address Range |
Subnetwork |
Systems Eng |
131.40.86.0 -> 131.40.86.63 |
131.40.86.0 |
Applications Eng |
131.40.86.64 -> 131.40.86.127 |
131.40.86.64 |
Graphics Eng |
131.40.86.128 -> 131.40.86.191 |
131.40.86.128 |
Customer Support |
131.40.86.192 -> 131.40.86.255 |
131.40.86.192 |
The last octet of the address will have two bits of netmask and six
of host number:
11000000 binary = 192 decimal
The resulting netmask: 255.255.255.192
The resulting netmasks file is:
131.40.0.0 255.255.255.0
131.40.86.0 255.255.255.192
131.40.86.64 255.255.255.192
131.40.86.128 255.255.255.192
131.40.86.192 255.255.255.192
The first entry indicates that the Class B network 131.40.0.0 is
subnetted. The next four entries represent the four variable-length
subnets for the classless addresses for the different groups.
Addresses 131.40.86.0 through 131.40.86.255 have a subnet mask with
26 bits in the subnet fields and 6 bits in the host field. All other
addresses in the range 131.40.0.0 through 131.40.255.255 have a 24
bit subnet field. The IP address assignments for the five network
interfaces are shown in
Table 13-5.
Table 13-5. Assigning addresses to interfaces
Interface |
Subnetwork Range |
Broadcast |
Sample IP Address |
hme0 |
131.40.7.0 Backbone |
131.40.7.255 |
131.40.7.22 |
hme1 |
131.40.86.0 -> 131.40.86.63 |
131.40.86.63 |
131.40.86.1 |
hme2 |
131.40.86.64 -> 131.40.86.127 |
131.40.86.63 |
131.40.86.65 |
hme3 |
131.40.86.128 -> 131.40.86.191 |
131.40.86.63 |
131.40.86.129 |
hme4 |
131.40.86.192 -> 131.40.86.255 |
131.40.86.63 |
131.40.86.193 |
For example, the server would direct network traffic to the
hme0 interface when communicating with IP
address 131.40.7.78, since it is part of the 131.40.7.0 subnet;
hme1 when communicating with 131.40.86.32, since
it is part of the 131.40.86.0 subnet; hme2 when
communicating with 131.40.7.78, and so on.
ifconfig only governs the local machine's
interface to the network. If a host cannot exchange packets with a
peer host on the same network, then it is necessary to verify that a
datagram circuit to the remote host exists and that the remote node
is properly advertising itself on the network. Tools that perform
these
tests are
arp and ping.
13.2.3. IP to MAC address mappings
Applications use IP addresses and hostnames to identify remote nodes, but
packets sent on the Ethernet identify their destinations via a 48-bit
MAC-layer address. The Ethernet interface on each host only receives
packets that have its MAC address of a broadcast address in the
destination field. IP addresses are completely independent of the
48-bit MAC-level address; several disjoint networks may use the same
sets of IP addresses although the 48-bit addresses to which they map
are unique worldwide.
You can tell who makes an Ethernet interface by looking at the first
three octets of its address. Some of the most popular prefixes are
shown in Table 13-6. Fortunately, newer diagnostic
tools such as
ethereal know how
to
map the prefix number to the vendor of the interface.
ethereal is introduced later in this chapter in Section 13.5.2, "ethereal / tethereal".
Table 13-6. Ethernet address prefixes
Prefix |
Vendor |
Prefix |
Vendor |
Prefix |
Vendor |
00:00:0c |
Cisco |
00:20:85 |
3Com |
00:e0:34 |
Cisco |
00:00:3c |
Auspex |
00:20:af |
3Com |
00:e0:4f |
Cisco |
00:00:63 |
Hewlett-Packard |
00:60:08 |
3Com |
00:e0:a3 |
Cisco |
00:00:65 |
Network General |
00:60:09 |
Cisco |
00:e0:f7 |
Cisco |
00:00:69 |
Silicon Graphics |
00:60:2f |
Cisco |
00:e0:f9 |
Cisco |
00:00:f8 |
DEC |
00:60:3e |
Cisco |
00:e0:fe |
Cisco |
00:01:fa |
Compaq |
00:60:47 |
Cisco |
02:60:60 |
3Com |
00:04:ac |
IBM |
00:60:5c |
Cisco |
02:60:8c |
3Com |
00:06:0d |
Hewlett-Packard |
00:60:70 |
Cisco |
08:00:02 |
3Com |
00:06:29 |
IBM |
00:60:83 |
Cisco |
08:00:09 |
Hewlett-Packard |
00:06:7c |
Cisco |
00:60:8c |
3Com |
08:00:1a |
Data General |
00:06:c1 |
Cisco |
00:60:97 |
3Com |
08:00:1b |
Data General |
00:07:01 |
Cisco |
00:60:b0 |
Hewlett-Packard |
08:00:20 |
Sun Microsystems |
00:07:0d |
Cisco |
00:80:1c |
Cisco |
08:00:2b |
DEC |
00:08:c7 |
Compaq |
00:80:5f |
Compaq |
08:00:5a |
IBM |
00:10:11 |
Cisco |
00:90:27 |
Intel |
08:00:69 |
Silicon Graphics |
00:10:1f |
Cisco |
00:90:b1 |
Cisco |
08:00:79 |
Silicon Graphics |
00:10:2f |
Cisco |
00:a0:24 |
3Com |
10:00:5a |
IBM |
00:10:4b |
3Com |
00:aa:00 |
Intel |
10:00:90 |
Hewlett-Packard |
00:10:79 |
Cisco |
00:c0:4f |
Dell |
10:00:d4 |
DEC |
00:10:7b |
Cisco |
00:c0:95 |
Network Appliance |
3C:00:00 |
3Com |
00:10:f6 |
Cisco |
00:e0:14 |
Cisco |
aa:00:03 |
DEC |
00:20:35 |
IBM |
00:e0:1e |
Cisco |
aa:00:04 |
DEC |
ARP, the Address Resolution Protocol, is used to maintain tables of
32- to 48-bit address translations. The ARP
table is a dynamic collection of MAC-to-IPv4 address
mappings. To fill in the MAC-level Ethernet packet headers, the
sending host must resolve the destination IPv4 address into a 48-bit
address. The host first checks its ARP table for an entry keyed by
the IPv4 address, and if none is found, the host broadcasts an ARP
request containing the recipient's IPv4 address. Any machine
supporting ARP address resolution responds to an ARP request with a
packet containing its MAC address. The requester updates its ARP
table, fills in the MAC address in the Ethernet packet header, and
transmits the packet.
If no reply is received for the ARP request, the transmitting host
sends the request again. Typically, a delay of a second or more is
inserted between consecutive ARP requests to prevent a series of ARP
packets from saturating the network. Flurries of ARP requests
sometimes occur when a malformed packet is sent on the network; some
hosts interpret it as a broadcast packet and attempt to get the
Ethernet address of the sender via an ARP request. If many machines
are affected, the ensuing flood of network activity can consume a
considerable amount of the available bandwidth. This behavior is
referred to as an
ARP storm, and is most
frequently caused by an electrical problem in a transceiver that
damages packets after the host has cleanly written them over its
network interface.
To examine the current ARP table entries, use
arp
-a:
% arp -a
Net to Media Table: IPv4
Device IP Address Mask Flags Phys Addr
------ -------------------- --------------- ----- ---------------
hme0 caramba 255.255.255.255 08:00:20:b9:2b:f6
hme1 socks 255.255.255.255 08:00:20:e7:91:5d
hme0 copper 255.255.255.255 00:20:af:9d:7c:92
hme0 roger 255.255.255.255 SP 08:00:20:a0:33:90
hme0 universo 255.255.255.255 U
hme0 peggy 255.255.255.255 SP 08:00:20:81:23:f1
hme1 duke 255.255.255.255 00:04:00:20:56:d7
hme0 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00
hme1 224.0.0.0 240.0.0.0 SM 01:00:5e:00:00:00
hme1 daisy 255.255.255.255 08:00:20:b5:3d:d7
The
arp -a output listing reports the interface
over which the ARP notification arrived, the IP address (or hostname)
and its Ethernet address mapping. The unresolved entry (denoted by
the
U flag) is for a host that did not respond
to an ARP request; after several minutes the entry is removed from
the table. Complete entries in the ARP table may be
static or
dynamic,
indicating how the address mappings were added and the length of
their expected lifetimes.
Solaris identifies static entries with the
S
flag. The host's own Ethernet address as well as all multicast
address entries (identified by the
M flag) will
always be static.The previous example was run on the host
roger, therefore the static nature of the entry
for its own Ethernet address and multicast entries. The absence of
the
S flag identifies a dynamic or learned
entry.
Dynamic entries are added on demand during the course of normal IP
traffic handling. Infrequently used mappings added in this fashion
have a short lifetime; after five minutes without a reference to the
entry, the ARP table management routines remove it. This ongoing
table pruning is necessary to minimize the overhead of ARP table
lookups. The ARP table is accessed using a hash table; a smaller,
sparser table has fewer hash key collisions. A host that communicates
regularly with many other hosts may have an ARP table that is fairly
large, while a host that is quiescent or exchanging packets with only
a few peers has a small ARP table.
The difference between dynamic and
permanent
entries is how they are added to the
ARP table. Dynamic entries are added on the fly, as a result of
replies to ARP requests. Permanent entries are loaded into the ARP
table once at boot time, and are useful if a host must communicate
with a node that cannot respond to an ARP request during some part of
its startup procedure. For example, a diskless client may not have
ARP support embedded in the boot PROM, requiring its boot server to
have a permanent ARP table entry for it. Once the diskless node is
running the Unix kernel, it should be able to respond to ARP requests
to complete dynamic ARP table entries on other hosts.
The arp -a output reports a mask for every
entry. This mask is used during lookup of an entry in the ARP table.
The lookup function in the kernel applies the mask to the address
being queried and compares it with the one in the table. If the
resulting addresses match, the lookup is successful. A mask of
255.255.255.255 (all ones) means that the two addresses need to be
exactly the same in order to be considered equivalent. A mask of
240.0.0.0 means that only the upper four bits of the address are used
to find a matching address. In the previous example, all multicast
addresses use the Ethernet address corresponding to the 240.0.0.0
entry. The ARP mask does not provide much useful information to the
regular user. Be sure not to confuse this ARP mask with the netmask
specified by the ifconfig command. The ARP mask
is generated and used only by the internal kernel routines to reduce
the number of entries that need to be stored in the table. The
netmask specified by the ifconfig command is
used for IP routing.
A variation of the permanent ARP table
entry
is a published
mapping. Published mappings are denoted by the P
flag. Published entries include the IP address for the current host,
and the addresses that have been explicitly added by the
-s or -f options (explained
later in this chapter).
Publishing ARP table entries turns a host into an ARP server.
Normally, a host replies only to requests for its own IP address, but
if it has published entries then it replies for multiple IP
addresses. If an ARP request is broadcast requesting the IP address
of a published entry, the host publishing that entry returns an ARP
reply to the sender, even though the IP address in the ARP request
does not match its own.
This mechanism is used to cope with machines that cannot respond to
ARP requests due to lack of ARP support or because they are isolated
from broadcast packets by a piece of network partitioning hardware
that filters out broadcast packets. This mechanism is also useful in
SLIP or PPP configurations. When any of these situations exist, a
machine is designated as an ARP server and is loaded with ARP entries
from a file containing hostnames, Ethernet addresses, and the
pub qualifier. For example, to publish the ARP
entries for hosts relax and
stress on server irie, we
put the ARP information into a configuration file
/etc/arptable and then load it using
arp -f:
irie# cat /etc/arptable
relax 08:00:20:73:3e:ec pub
stress 08:00:20:b9:18:3d pub
irie# arp -f /etc/arptable
The -f option forces arp to
read the named file for entries, alternatively the
-s option can be used to add a single mapping
from the command line:
irie# arp -s relax 08:00:20:73:3e:ec pub
As a diagnostic tool, arp is useful for
resolving esoteric point-to-point connectivity problems. If a
host's ARP table contains an incorrect entry, the machine using
it will not be reachable, since outgoing packets will contain the
wrong Ethernet address. ARP table entries may contain incorrect
Ethernet addresses for several reasons:
- Another host on the network is answering ARP requests for the same IP
address, or all IP addresses, emulating a duplicate IP address on the
network.
- A host with a published ARP entry contains the wrong Ethernet address
in its ARP table.
- Either of the above situations exist, and the incorrect ARP reply
arrives at the requesting host after the correct reply. When ARP
table entries are updated dynamically, the last response received is
the one that "wins." If the correct ARP response is
received from a host that is physically close to the requester, and a
duplicate ARP response arrives from a host that is located across
several Ethernet bridges, then the later -- and probably
incorrect -- response is the one that the machine uses for
future packet transmissions.
Inspection of the ARP table can reveal some obvious problems; for
example, the three-octet prefix of the machine's Ethernet
address does not agree with the vendor's label on the front of
the machine. If you believe you are suffering from intermittent ARP
failures, you can delete specific ARP table entries and monitor the
table as it is repopulated dynamically. ARP table entries are deleted
with arp -d, and only the superuser can delete
entries. In the following example, we delete the ARP table entry for
fenwick, then force the local host to send an
ARP request for fenwick by attempting to connect
to it using telnet. By examining the ARP table
after the connection attempt, we can see if some other host has
responded incorrectly to the ARP request:
# arp -d fenwick
fenwick (131.40.52.44) deleted
# telnet fenwick
...Telnet times out...
# arp -a | grep fenwick
hme0 fenwick 255.255.255.255 08:00:20:79:61:eb
An example involving intermittent ARP failures is presented in Chapter 15, "Debugging Network Problems".
IPv6 nodes use the neighbor discovery mechanism to learn the link
layer address (MAC in the case of Ethernet) of the other nodes
connected to the link. The IPv6 neighbor discovery mechanism delivers
the functionality previously provided by the combination of ARP, ICMP
router discovery, and ICMP redirect mechanisms. This is done by
defining special ICMP6 message types: neighbor solicitation and
neighbor advertisement. A node issues neighbor solicitations when it
needs to request the link-layer (MAC) address of a neighbor. Nodes
will also issue neighbor advertisement messages in response to
neighbor solicitation messages, as well as when their link-layer
address
changes.
13.2.5. Gauging Ethernet interface capacity
Even with a well-conditioned network
and proper host configuration
information, a server may have trouble communicating with its clients
because its network interface is overloaded. If an NFS server is hit
with more packets than it can receive through its network interface,
some client requests will be lost and eventually retransmitted. To
the NFS clients, the server appears painfully slow, when it's
really the server's network interface that is the problem.
The spray utility provides a
very coarse estimate of network interface capacity, both on
individual hosts and through network hardware between hosts.
spray showers a target host with consecutive
packets of a fixed length by making remote procedure calls to the
rpc.sprayd daemon on the remote host. After the
last packet is sent, the rpc.sprayd daemon is
queried for a count of the packets received; this value is compared
to the number of packets sent to determine the percentage dropped
between client and server.
On its own, spray is of limited usefulness as a
measure of the packet handling capability of a machine. The packet
containing the RPC call may be lost by the client, due to other
activity on its network interface; it may be consumed by a collision
on the network; or it may be incident to the server but not copied
from the network by the server's network interface due to a
lack of buffer space or excessive server CPU loading. Many packets
are lost on the sending host, and spray has no
knowledge of where the packets vanish once they get pass the
application layer. Due to these factors, spray
is best used to gauge the relative packet-handling speeds of two or
more machines.
Here are some examples of using spray to test
various network constraints. spray requires a
hostname and takes a packet count, delay value, and packet length as
optional arguments:
spray [-c count] [-d delay] [-l length] host
For example:
% spray wahoo
sending 1162 packets of length 86 to wahoo ...
675 packets (58.090%) dropped by wahoo
1197 packets/sec, 103007 bytes/sec
spray reports the number of packets received, as
well as the transfer rate. The packet drop rates are only meaningful
when used to compare the relative network input and output rates of
the two machines under test.
It's important to note that network interface speed depends
upon much more than CPU speed. A faster CPU helps a host process
network protocols faster, but the network interface and bus hardware
usually determine how quickly the host can pull packets from the
network. A fast network interface may be separated from the CPU by a
bus that has a high latency. Even a high-throughput I/O system may
exhibit poor network performance if there is a large time overhead
required to set up each packet transfer from the network interface to
the CPU. Similar hosts stress each other fairly, since their network
interfaces have the same input capacity.
Even on a well-conditioned, little-used network, a client machine
that has a significantly faster CPU than its server may perform worse
under the stress of spray than the same two
machines with the client and server roles reversed. With increased
CPU speed comes increased packet handling speed, so a faster machine
can transmit packets quickly enough to outpace a slower server. If
the disparity between client and server is great, then the client is
forced to retransmit requests and the server is additionally burdened
with the duplicate requests. Use spray to
exercise combinations of client and server with varying packet sizes
to identify cases in which a client may race ahead of its server.
When a fast NFS client is teamed with a slower server, the NFS mount
parameters require tuning as described in Section 18.1, "Slow server compensation".
Send various sized packets to an NFS server to see how it handles
"large" and "small" NFS requests. Disk write
operations are "large," usually filling several full-size
IP packets. Other operations, such as getting the attributes of a
file, fit into a packet of 150 bytes or less. Small packets are more
easily handled by all hosts, since there is less data to move around,
but NFS servers may be subject to bursts of large packets during
intense periods of client write operations. If no explicit arguments
are given,
spray sends 1162 packets of 86 bytes.
In most implementations of
spray, if either a
packet count or packet length are given, the other argument is chosen
so that 100 kbytes of data are transferred between client and server.
Try using
spray with packet sizes of 1500 bytes
to judge how well an NFS server or the network handle write requests.
Normally, no delay is inserted between packets sent by
spray, although the
-d
option may be used to specify a delay in microseconds. Insert delays
between the packets to simulate realistic packet arrival rates, under
"normal" conditions. Client requests may be separated by
several tens of microseconds, so including a delay between packets
may give you a more accurate picture of packet handling rates.
In
Figure 13-1,
baxter and
arches are identical machines and
acadia is a faster machine with a faster network
interface.
spray produces the following output:
Fast machine to slow machine:
[acadia]% spray baxter -c 100 -l 1160
sending 100 packets of length 1162 to baxter ...
39 packets (39.000%) dropped by baxter
520 packets/sec, 605037 bytes/sec
Fast machine to slow machine, with delay:
[acadia]% spray baxter -c 100 -l 1160 -d 1
sending 100 packets of length 1162 to baxter ...
no packets dropped by baxter
99 packets/sec, 115680 bytes/sec
Slow machine to fast machine:
[baxter]% spray acadia -c 100 -l 1160
sending 100 packets of length 1162 to acadia ...
no packets dropped by acadia
769 packets/sec, 893846 bytes/sec
Slow machine to identical machine:
[baxter]% spray arches -c 100 -l 1160
sending 100 packets of length 1162 to arches ...
no packets dropped by arches
769 packets/sec, 893846 bytes/sec
Figure 13-1. Testing relative packet handling rates
When the fast machine sprays the slower one, a significant number of
packets are dropped; but adding a one-microsecond delay between the
packets allows the slow machine to keep pace and receive all incident
packets. The slow machine to fast machine test produces the same
packet handling rate as the slow machine showering an identical peer;
if the slow machine sprays the fast one, the network bandwidth used
is more than 30% greater than when the fast machine hammers the slow
one. Note that you couldn't get NFS to insert delays like this,
but performing the test with delays may indicate the location of a
bottleneck. Knowing your constraints, you can change other
configuration parameters, such as NFS client behavior, to avoid the
bottleneck. We'll look at these tuning procedures more in Chapter 18, "Client-Side Performance Tuning".
The four tools discussed to this point --
ifconfig,
arp, ping, and
spray -- focus on
the issues of packet addressing and routing. If they indicate a
problem, all network services, such as
telnet
and
rlogin, will be affected. We now move up
through the network and transport layers in the network protocol
stack, leaving the MAC
and IP layers for the session and application layers.