tcpp -- Parallel TCP Exercise Tool

This is a new tool, and is rife with bugs.  However, it appears to create
even more problems for device drivers and the kernel, so that's OK.

This tool generates large numbers of TCP connections and stuffs writes lots
of data into them.  One binary encapsulates both a client and a server.
Each of the client and the server generates a certain number of worker
processes, each of which in turn uses its own TCP port.  The number of
server processes must be >= the number of client processes, or some of the
ports required by the client won't have a listener.  The client then
proceeds to make connections and send data to the server.  Each worker
multiplexes many connections at once, up to a maximum parallelism limit.

Run the server as:

  ./tcpp -s -p <numprocs>

Run the client as:

  ./tcpp -c <serverIP> -p <numprocs> -t <numconnectionsperproc> -m
    <maxconnectionsatonceperproc> -b <bytesofdataperconnection>

All fields have default values, you'll probably want to have -p be <= the
number of cores, and vary -m up to -t.  A good trial might be:

  ./tcpp -s -p 4
  ./tcpp -c <clientip> -p 4 -t 1000 -m 100

This creates 4000 total TCP connections, up to 400 can be running at a time
due to the concurrency limit of 100 per process.  I don't specifically
implement fair handling of connections by each process, so it's probably
less as kqueue probably isn't returning them round-robin.  There's also a
bandwidth estimate at the end if it worked out OK, which is probably wrong
but not horribly wrong in the cases I've looked at.

Known Issues
------------

The bandwidth estimate doesn't handle failures well.  It also has serious
rounding errors and probably conceptual problems.

Kqueue is used to return one event in each work loop cycle, so it seems
likely TCP connections are not being serviced by workers fairly.  We should
pass in as many kevent structures as active connections, then walk the
entire list before going back for more work, in order to try to handle them
a bit more fairly.

Rather than passing the length for each connection, we might want to pass
it once with a control connection up front.  On the other hand, the server
is quite dumb right now, so we could take advantage of this to do size
mixes.

Configuration Notes
-------------------

In my testing, I use:

sysctl net.inet.ip.portrange.first=10000
sysctl kern.ipc.maxsockets=30000

# if running !multiq:
kenv hw.cxgb.singleq="1"

kldload if_cxgb
ifconfig cxgb0 -tso
ifconfig cxgb0 mtu 1500
