linux-2.6
16 years agonetns PF_KEY: per-netns /proc/pfkey
Alexey Dobriyan [Wed, 26 Nov 2008 01:59:00 +0000 (17:59 -0800)] 
netns PF_KEY: per-netns /proc/pfkey

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns PF_KEY: part 2
Alexey Dobriyan [Wed, 26 Nov 2008 01:58:31 +0000 (17:58 -0800)] 
netns PF_KEY: part 2

* interaction with userspace -- take netns from userspace socket.
* in ->notify hook take netns either from SA or explicitly passed --
we don't know if SA/SPD flush is coming.
* stub policy migration with init_net for now.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns PF_KEY: part 1
Alexey Dobriyan [Wed, 26 Nov 2008 01:58:07 +0000 (17:58 -0800)] 
netns PF_KEY: part 1

* netns boilerplate
* keep per-netns socket list
* keep per-netns number of sockets

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: flush SA/SPDs on netns stop
Alexey Dobriyan [Wed, 26 Nov 2008 01:57:44 +0000 (17:57 -0800)] 
netns xfrm: flush SA/SPDs on netns stop

SA/SPD doesn't pin netns (and it shouldn't), so get rid of them by hand.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: ->get_saddr in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:56:49 +0000 (17:56 -0800)] 
netns xfrm: ->get_saddr in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: ->dst_lookup in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:51:25 +0000 (17:51 -0800)] 
netns xfrm: ->dst_lookup in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: KM reporting in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:51:01 +0000 (17:51 -0800)] 
netns xfrm: KM reporting in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: pass netns with KM notifications
Alexey Dobriyan [Wed, 26 Nov 2008 01:50:36 +0000 (17:50 -0800)] 
netns xfrm: pass netns with KM notifications

SA and SPD flush are executed with NULL SA and SPD respectively, for
these cases pass netns explicitly from userspace socket.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: xfrm_user module in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:50:08 +0000 (17:50 -0800)] 
netns xfrm: xfrm_user module in netns

Grab netns either from netlink socket, state or policy.

SA and SPD flush are in init_net for now, this requires little
attention, see below.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns NETLINK_XFRM socket
Alexey Dobriyan [Wed, 26 Nov 2008 01:38:20 +0000 (17:38 -0800)] 
netns xfrm: per-netns NETLINK_XFRM socket

Stub senders to init_net's one temporarily.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: xfrm_input() fixup
Alexey Dobriyan [Wed, 26 Nov 2008 01:37:56 +0000 (17:37 -0800)] 
netns xfrm: xfrm_input() fixup

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: dst garbage-collecting in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:37:23 +0000 (17:37 -0800)] 
netns xfrm: dst garbage-collecting in netns

Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()

[This needs more thoughts on what to do with dst_ops]
[Currently stub to init_net]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: flushing/pruning bundles in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:36:51 +0000 (17:36 -0800)] 
netns xfrm: flushing/pruning bundles in netns

Allow netdevice notifier as result.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: xfrm_route_forward() in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:36:13 +0000 (17:36 -0800)] 
netns xfrm: xfrm_route_forward() in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: xfrm_policy_check in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:35:44 +0000 (17:35 -0800)] 
netns xfrm: xfrm_policy_check in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: lookup in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:35:18 +0000 (17:35 -0800)] 
netns xfrm: lookup in netns

Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.

Take it from socket or netdevice. Stub DECnet to init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: policy walking in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:34:49 +0000 (17:34 -0800)] 
netns xfrm: policy walking in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: finding policy in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:34:20 +0000 (17:34 -0800)] 
netns xfrm: finding policy in netns

Add netns parameter to xfrm_policy_bysel_ctx(), xfrm_policy_byidx().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: policy flushing in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:33:32 +0000 (17:33 -0800)] 
netns xfrm: policy flushing in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: policy insertion in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:33:06 +0000 (17:33 -0800)] 
netns xfrm: policy insertion in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: propagate netns into policy byidx hash
Alexey Dobriyan [Wed, 26 Nov 2008 01:32:41 +0000 (17:32 -0800)] 
netns xfrm: propagate netns into policy byidx hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: state walking in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:32:14 +0000 (17:32 -0800)] 
netns xfrm: state walking in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: finding states in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:31:51 +0000 (17:31 -0800)] 
netns xfrm: finding states in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: fixup xfrm_alloc_spi()
Alexey Dobriyan [Wed, 26 Nov 2008 01:31:18 +0000 (17:31 -0800)] 
netns xfrm: fixup xfrm_alloc_spi()

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: state lookup in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:30:50 +0000 (17:30 -0800)] 
netns xfrm: state lookup in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: state flush in netns
Alexey Dobriyan [Wed, 26 Nov 2008 01:30:18 +0000 (17:30 -0800)] 
netns xfrm: state flush in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: trivial netns propagations
Alexey Dobriyan [Wed, 26 Nov 2008 01:29:47 +0000 (17:29 -0800)] 
netns xfrm: trivial netns propagations

Take netns from xfrm_state or xfrm_policy.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: propagate netns into bydst/bysrc/byspi hash functions
Alexey Dobriyan [Wed, 26 Nov 2008 01:29:21 +0000 (17:29 -0800)] 
netns xfrm: propagate netns into bydst/bysrc/byspi hash functions

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns policy hash resizing work
Alexey Dobriyan [Wed, 26 Nov 2008 01:28:57 +0000 (17:28 -0800)] 
netns xfrm: per-netns policy hash resizing work

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns policy counts
Alexey Dobriyan [Wed, 26 Nov 2008 01:24:15 +0000 (17:24 -0800)] 
netns xfrm: per-netns policy counts

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_policy_bydst hash
Alexey Dobriyan [Wed, 26 Nov 2008 01:23:48 +0000 (17:23 -0800)] 
netns xfrm: per-netns xfrm_policy_bydst hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns inexact policies
Alexey Dobriyan [Wed, 26 Nov 2008 01:23:26 +0000 (17:23 -0800)] 
netns xfrm: per-netns inexact policies

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_policy_byidx hashmask
Alexey Dobriyan [Wed, 26 Nov 2008 01:22:58 +0000 (17:22 -0800)] 
netns xfrm: per-netns xfrm_policy_byidx hashmask

Per-netns hashes are independently resizeable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_policy_byidx hash
Alexey Dobriyan [Wed, 26 Nov 2008 01:22:35 +0000 (17:22 -0800)] 
netns xfrm: per-netns xfrm_policy_byidx hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns policy list
Alexey Dobriyan [Wed, 26 Nov 2008 01:22:11 +0000 (17:22 -0800)] 
netns xfrm: per-netns policy list

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: add struct xfrm_policy::xp_net
Alexey Dobriyan [Wed, 26 Nov 2008 01:21:45 +0000 (17:21 -0800)] 
netns xfrm: add struct xfrm_policy::xp_net

Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns km_waitq
Alexey Dobriyan [Wed, 26 Nov 2008 01:21:01 +0000 (17:21 -0800)] 
netns xfrm: per-netns km_waitq

Disallow spurious wakeups in __xfrm_lookup().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns state GC work
Alexey Dobriyan [Wed, 26 Nov 2008 01:20:36 +0000 (17:20 -0800)] 
netns xfrm: per-netns state GC work

State GC is per-netns, and this is part of it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns state GC list
Alexey Dobriyan [Wed, 26 Nov 2008 01:20:11 +0000 (17:20 -0800)] 
netns xfrm: per-netns state GC list

km_waitq is going to be made per-netns to disallow spurious wakeups
in __xfrm_lookup().

To not wakeup after every garbage-collected xfrm_state (which potentially
can be from different netns) make state GC list per-netns.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_hash_work
Alexey Dobriyan [Wed, 26 Nov 2008 01:19:07 +0000 (17:19 -0800)] 
netns xfrm: per-netns xfrm_hash_work

All of this is implicit passing which netns's hashes should be resized.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_state counts
Alexey Dobriyan [Wed, 26 Nov 2008 01:18:39 +0000 (17:18 -0800)] 
netns xfrm: per-netns xfrm_state counts

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_state_hmask
Alexey Dobriyan [Wed, 26 Nov 2008 01:18:12 +0000 (17:18 -0800)] 
netns xfrm: per-netns xfrm_state_hmask

Since hashtables are per-netns, they can be independently resized.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_state_byspi hash
Alexey Dobriyan [Wed, 26 Nov 2008 01:17:47 +0000 (17:17 -0800)] 
netns xfrm: per-netns xfrm_state_byspi hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_state_bysrc hash
Alexey Dobriyan [Wed, 26 Nov 2008 01:17:24 +0000 (17:17 -0800)] 
netns xfrm: per-netns xfrm_state_bysrc hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_state_bydst hash
Alexey Dobriyan [Wed, 26 Nov 2008 01:16:58 +0000 (17:16 -0800)] 
netns xfrm: per-netns xfrm_state_bydst hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: per-netns xfrm_state_all list
Alexey Dobriyan [Wed, 26 Nov 2008 01:16:11 +0000 (17:16 -0800)] 
netns xfrm: per-netns xfrm_state_all list

This is done to get
a) simple "something leaked" check
b) cover possible DoSes when other netns puts many, many xfrm_states
   onto a list.
c) not miss "alien xfrm_state" check in some of list iterators in future.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: add struct xfrm_state::xs_net
Alexey Dobriyan [Wed, 26 Nov 2008 01:15:16 +0000 (17:15 -0800)] 
netns xfrm: add struct xfrm_state::xs_net

To avoid unnecessary complications with passing netns around.

* set once, very early after allocating
* once set, never changes

For a while create every xfrm_state in init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns xfrm: add netns boilerplate
Alexey Dobriyan [Wed, 26 Nov 2008 01:14:31 +0000 (17:14 -0800)] 
netns xfrm: add netns boilerplate

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoxfrm: initialise xfrm_policy_gc_work statically
Alexey Dobriyan [Wed, 26 Nov 2008 01:13:59 +0000 (17:13 -0800)] 
xfrm: initialise xfrm_policy_gc_work statically

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago3c523: fix warning in drivers/net/3c523.c
Ingo Molnar [Wed, 26 Nov 2008 01:02:20 +0000 (17:02 -0800)] 
3c523: fix warning in drivers/net/3c523.c

fix warning:

  drivers/net/3c523.c:582: warning: ‘cleanup_card’ defined but not used

No code changed:

md5:
   ebe4a1b27d3f21b0b12a78c58463b0d7  3c523.o.before.asm
   ebe4a1b27d3f21b0b12a78c58463b0d7  3c523.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodepca: fix warning in drivers/net/depca.c
Ingo Molnar [Wed, 26 Nov 2008 01:00:39 +0000 (17:00 -0800)] 
depca: fix warning in drivers/net/depca.c

fix warning:

  drivers/net/depca.c: In function ‘depca_eisa_probe’:
  drivers/net/depca.c:1564: warning: ‘mem_start’ may be used uninitialized in this function

this seems to be a real bug - depca_eisa_probe() does not check
for failure. Add it, symmetric to depca_isa_probe().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoatlx: fix warning in drivers/net/atlx/atl2.c
Ingo Molnar [Wed, 26 Nov 2008 01:00:05 +0000 (17:00 -0800)] 
atlx: fix warning in drivers/net/atlx/atl2.c

fix this warning:

  drivers/net/atlx/atl2.c: In function ‘atl2_request_irq’:
  drivers/net/atlx/atl2.c:644: warning: unused variable ‘err’

'err' is unused in the !CONFIG_PCI_MSI case.

Instead of further increasing the #ifdeffery in this function,
restructure the code a bit and get rid of the #ifdef. This
relies on the fact that pci_enable_msi() will always fail in
the !CONFIG_PCI_MSI case.

There should be no change in driver behavior.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agobluetooth: fix warning in net/bluetooth/rfcomm/sock.c
Ingo Molnar [Wed, 26 Nov 2008 00:59:21 +0000 (16:59 -0800)] 
bluetooth: fix warning in net/bluetooth/rfcomm/sock.c

fix this warning:

  net/bluetooth/rfcomm/sock.c: In function ‘rfcomm_sock_ioctl’:
  net/bluetooth/rfcomm/sock.c:795: warning: unused variable ‘sk’

perhaps BT_DEBUG() should be improved to do printf format checking
instead of the #ifdef, but that looks quite intrusive: each bluetooth
.c file undefines the macro.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agosunrpc: fix warning in net/sunrpc/xprtrdma/verbs.c
Ingo Molnar [Wed, 26 Nov 2008 00:58:42 +0000 (16:58 -0800)] 
sunrpc: fix warning in net/sunrpc/xprtrdma/verbs.c

fix this warning:

  net/sunrpc/xprtrdma/verbs.c: In function ‘rpcrdma_conn_upcall’:
  net/sunrpc/xprtrdma/verbs.c:279: warning: unused variable ‘addr’

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoax25: fix warning in net/ax25/sysctl_net_ax25.c
Ingo Molnar [Wed, 26 Nov 2008 00:58:19 +0000 (16:58 -0800)] 
ax25: fix warning in net/ax25/sysctl_net_ax25.c

fix this warning:

  net/ax25/sysctl_net_ax25.c:27: warning: ‘min_ds_timeout’ defined but not used
  net/ax25/sysctl_net_ax25.c:27: warning: ‘max_ds_timeout’ defined but not used

These are only used in the CONFIG_AX25_DAMA_SLAVE case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agomlx4: fix warning in drivers/net/mlx4/mcg.c
Ingo Molnar [Wed, 26 Nov 2008 00:57:59 +0000 (16:57 -0800)] 
mlx4: fix warning in drivers/net/mlx4/mcg.c

fix warning:

  drivers/net/mlx4/mcg.c: In function ‘mlx4_multicast_attach’:
  drivers/net/mlx4/mcg.c:217: warning: integer overflow in expression

there was no real danger of overflow here though.

md5:
   db8eb55620f886c03854a2abb2ce6c3f  mcg.o.before.asm
   db8eb55620f886c03854a2abb2ce6c3f  mcg.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodccp: fix warning in net/dccp/options.c
Ingo Molnar [Wed, 26 Nov 2008 00:57:30 +0000 (16:57 -0800)] 
dccp: fix warning in net/dccp/options.c

this warning:

  net/dccp/options.c: In function ‘dccp_parse_options’:
  net/dccp/options.c:67: warning: ‘value’ may be used uninitialized in this function

is a bogus GCC warning. The compiler does not recognize the relation
between "value" and "mandatory" variables: the code flow can ever reach
the "out_invalid_option:" label if 'mandatory' is set to 1, and when
'mandatory' is non-zero, we'll always have 'value' initialized.

Help out the compiler by annotating the variable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agocassini: fix warning in drivers/net/cassini.c
Ingo Molnar [Wed, 26 Nov 2008 00:57:05 +0000 (16:57 -0800)] 
cassini: fix warning in drivers/net/cassini.c

this warning:

  drivers/net/cassini.c: In function ‘cas_rx_ringN’:
  drivers/net/cassini.c:2350: warning: ‘skb’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between cas_rx_process_pkt() and 'skb'.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agomlx4: fix error path in drivers/net/mlx4/en_rx.c
Ingo Molnar [Wed, 26 Nov 2008 00:53:32 +0000 (16:53 -0800)] 
mlx4: fix error path in drivers/net/mlx4/en_rx.c

this warning:

  drivers/net/mlx4/en_rx.c: In function ‘mlx4_en_activate_rx_rings’:
  drivers/net/mlx4/en_rx.c:412: warning: ‘err’ may be used uninitialized in this function

Triggers because 'err' is uninitialized in the following input
conditions: priv->rx_ring_num is zero and mlx4_en_fill_rx_buffers()
fails.

But even if ->rx_ring_num is nonzero, 'err' will be zero if
mlx4_en_fill_rx_buffers() fails and mlx4_en_activate_rx_rings() returns
success - incorrectly.

So it's best to keep the error code uptodate on mlx4_en_fill_rx_buffers()
calls as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoz85230: fix warning in drivers/net/wan/z85230.c
Ingo Molnar [Wed, 26 Nov 2008 00:53:08 +0000 (16:53 -0800)] 
z85230: fix warning in drivers/net/wan/z85230.c

this warning:

  drivers/net/wan/z85230.c: In function ‘z8530_interrupt’:
  drivers/net/wan/z85230.c:713: warning: ‘intr’ may be used uninitialized in this function

is clearly bogus - annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agosis900: fix warning in drivers/net/sis900.c
Ingo Molnar [Wed, 26 Nov 2008 00:52:13 +0000 (16:52 -0800)] 
sis900: fix warning in drivers/net/sis900.c

this warning:

  drivers/net/sis900.c: In function ‘sis900_timer’:
  drivers/net/sis900.c:1280: warning: ‘speed’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between sis900_read_mode(), 'speed' and 'duplex'.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agofix warning in fs/dlm/netlink.c
Ingo Molnar [Wed, 26 Nov 2008 00:51:45 +0000 (16:51 -0800)] 
fix warning in fs/dlm/netlink.c

this warning:

  fs/dlm/netlink.c: In function ‘dlm_timeout_warn’:
  fs/dlm/netlink.c:131: warning: ‘send_skb’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between prepare_data() and send_skb.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodsa: fix warning in net/dsa/mv88e6060.c
Ingo Molnar [Wed, 26 Nov 2008 00:51:13 +0000 (16:51 -0800)] 
dsa: fix warning in net/dsa/mv88e6060.c

this warning:

  net/dsa/mv88e6060.c: In function ‘mv88e6060_poll_link’:
  net/dsa/mv88e6060.c:225: warning: ‘port_status’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between 'link' and 'port_status'.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodsa: fix warning in net/dsa/mv88e6xxx.c
Ingo Molnar [Wed, 26 Nov 2008 00:50:49 +0000 (16:50 -0800)] 
dsa: fix warning in net/dsa/mv88e6xxx.c

this warning:

  net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_poll_link’:
  net/dsa/mv88e6xxx.c:361: warning: ‘port_status’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between 'link' and 'port_status'.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoipv6: fix warning in net/ipv6/ip6_flowlabel.c
Ingo Molnar [Wed, 26 Nov 2008 00:50:30 +0000 (16:50 -0800)] 
ipv6: fix warning in net/ipv6/ip6_flowlabel.c

this warning:

  net/ipv6/ip6_flowlabel.c: In function ‘ipv6_flowlabel_opt’:
  net/ipv6/ip6_flowlabel.c:467: warning: ‘err’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between fl_create() and 'err'.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agopkt_sched: fix warning in net/sched/sch_hfsc.c
Ingo Molnar [Wed, 26 Nov 2008 00:50:02 +0000 (16:50 -0800)] 
pkt_sched: fix warning in net/sched/sch_hfsc.c

this warning:

  net/sched/sch_hfsc.c: In function ‘hfsc_enqueue’:
  net/sched/sch_hfsc.c:1577: warning: ‘err’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between hfsc_classify(), 'cl' and 'err'.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agosunrpc: fix warning in net/sunrpc/xprtrdma/svc_rdma_transport.c
Ingo Molnar [Wed, 26 Nov 2008 00:49:37 +0000 (16:49 -0800)] 
sunrpc: fix warning in net/sunrpc/xprtrdma/svc_rdma_transport.c

this warning:

  net/sunrpc/xprtrdma/svc_rdma_transport.c: In function ‘svc_rdma_accept’:
  net/sunrpc/xprtrdma/svc_rdma_transport.c:830: warning: ‘dma_mr_acc’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) flow connection
between need_dma_mr and dma_mr_acc.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoqla3xxx: fix warning in drivers/net/qla3xxx.c
Ingo Molnar [Wed, 26 Nov 2008 00:49:07 +0000 (16:49 -0800)] 
qla3xxx: fix warning in drivers/net/qla3xxx.c

this warning:

  drivers/net/qla3xxx.c: In function ‘ql3xxx_probe’:
  drivers/net/qla3xxx.c:3912: warning: ‘pci_using_dac’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between 'pci_using_dac' and 'err'.

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoniu: fix another warning in drivers/net/niu.c
Ingo Molnar [Wed, 26 Nov 2008 00:48:42 +0000 (16:48 -0800)] 
niu: fix another warning in drivers/net/niu.c

this warning:

  drivers/net/niu.c: In function ‘esr_reset’:
  drivers/net/niu.c:741: warning: ‘reset’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between:

 - esr_read_reset() and 'reset'

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoniu: fix warnings in drivers/net/niu.c
Ingo Molnar [Wed, 26 Nov 2008 00:48:12 +0000 (16:48 -0800)] 
niu: fix warnings in drivers/net/niu.c

these warnings:

  drivers/net/niu.c: In function ‘serdes_init_niu_1g_serdes’:
  drivers/net/niu.c:451: warning: ‘sig’ may be used uninitialized in this function
  drivers/net/niu.c: In function ‘serdes_init_niu_10g_serdes’:
  drivers/net/niu.c:550: warning: ‘sig’ may be used uninitialized in this function

triggers because GCC does not recognize that the max_retry loop
always initializes 'sig', due to max_retry != 0.

Annotate them.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agos2io: fix warning in drivers/net/s2io.c
Ingo Molnar [Wed, 26 Nov 2008 00:47:35 +0000 (16:47 -0800)] 
s2io: fix warning in drivers/net/s2io.c

this warning:

  drivers/net/s2io.c: In function ‘rx_intr_handler’:
  drivers/net/s2io.c:7369: warning: ‘lro’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between:

 - s2io_club_tcp_session()and 'lro'

Annotate it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetns: filter out uevent not belonging to init_net
Daniel Lezcano [Wed, 26 Nov 2008 00:46:37 +0000 (16:46 -0800)] 
netns: filter out uevent not belonging to init_net

This patch will filter out the uevent not related to the init_net.
Without this patch if a network device is created in a network
namespace with the same name as one network device belonging to the
initial network namespace (eg. eth0), when the network namespace
will die and the network device will be destroyed, an event will
be sent and catched by the udevd daemon. That will result to have
the real network device to be shutdown because the udevd/uevent are
not namespace aware.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoixgbe: Naming interrupt vectors
Robert Olsson [Wed, 26 Nov 2008 00:43:52 +0000 (16:43 -0800)] 
ixgbe: Naming interrupt vectors

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoniu: Naming interrupt vectors.
Robert Olsson [Wed, 26 Nov 2008 00:41:57 +0000 (16:41 -0800)] 
niu: Naming interrupt vectors.

 A patch to put names on the niu interrupt vectors according the syntax below.
 This is needed to assign correct affinity.

 > So on a multiqueue card with 2 RX queues and 2 TX queues we'd
 > have names like:
 >
 >  eth0-rx-0
 >  eth0-rx-1
 >  eth0-tx-0
 >  eth0-tx-1

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Tested-by: Jesper Dangaard Brouer <jdb@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: skb_shift cannot cache frag ptrs past pskb_expand_head
Ilpo Järvinen [Tue, 25 Nov 2008 21:57:01 +0000 (13:57 -0800)] 
tcp: skb_shift cannot cache frag ptrs past pskb_expand_head

Since pskb_expand_head creates copy of the shared area we
cannot keep any frag ptr past de-cloning. This fixes the
tcpdump recvfrom -EFAULT problem.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agopkt_sched: sch_api: Remove qdisc_list_lock
Jarek Poplawski [Tue, 25 Nov 2008 21:56:06 +0000 (13:56 -0800)] 
pkt_sched: sch_api: Remove qdisc_list_lock

After implementing qdisc->ops->peek() there is no more calling
qdisc_tree_decrease_qlen() without rtnl_lock(), so qdisc_list_lock
added by commit: f6e0b239a2657ea8cb67f0d83d0bfdbfd19a481b "pkt_sched:
Fix qdisc list locking" can be removed.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: udp_unhash() can test if sk is hashed
Eric Dumazet [Tue, 25 Nov 2008 21:55:15 +0000 (13:55 -0800)] 
net: udp_unhash() can test if sk is hashed

Impact: Optimization

Like done in inet_unhash(), we can avoid taking a chain lock if
socket is not hashed in udp_unhash()

Triggered by close(socket(AF_INET, SOCK_DGRAM, 0));

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: Make sure BHs are disabled in sock_prot_inuse_add()
Eric Dumazet [Tue, 25 Nov 2008 21:53:27 +0000 (13:53 -0800)] 
net: Make sure BHs are disabled in sock_prot_inuse_add()

prot->destroy is not called with BH disabled. So we must add
explicit BH disable around call to sock_prot_inuse_add()
in sctp_destroy_sock()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: tcp_limit_reno_sacked can become static
Ilpo Järvinen [Tue, 25 Nov 2008 21:45:29 +0000 (13:45 -0800)] 
tcp: tcp_limit_reno_sacked can become static

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoRevert "hso: Fix crashes on close."
David S. Miller [Tue, 25 Nov 2008 11:53:09 +0000 (03:53 -0800)] 
Revert "hso: Fix crashes on close."

This reverts commit 4a3e818181e1baf970e9232ca8b747e233176b87.

On request from Alan Cox.

Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoRevert "hso: Fix free of mutexes still in use."
David S. Miller [Tue, 25 Nov 2008 11:52:46 +0000 (03:52 -0800)] 
Revert "hso: Fix free of mutexes still in use."

This reverts commit 52429eb216385fdc6969c0112ba8b46cffefaaef.

On request from Alan Cox.

Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoRevert "hso: Add TIOCM ioctl handling."
David S. Miller [Tue, 25 Nov 2008 11:52:17 +0000 (03:52 -0800)] 
Revert "hso: Add TIOCM ioctl handling."

This reverts commit 7ea3a9ad9bf360f746a7ad6fa72511a5c359490d.

On request from Alan Cox.

Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoxfrm: remove useless forward declarations
Alexey Dobriyan [Tue, 25 Nov 2008 09:05:54 +0000 (01:05 -0800)] 
xfrm: remove useless forward declarations

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoah4/ah6: remove useless NULL assignments
Alexey Dobriyan [Tue, 25 Nov 2008 09:05:09 +0000 (01:05 -0800)] 
ah4/ah6: remove useless NULL assignments

struct will be kfreed in a moment, so...

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoigb: loopback bits not correctly cleared from RCTL register
Alexander Duyck [Tue, 25 Nov 2008 09:04:03 +0000 (01:04 -0800)] 
igb: loopback bits not correctly cleared from RCTL register

This change forces the bits to 0 by using an &= operation with an inverted
mask of all options instead of using an |= with a value of 0.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoigb: remove unneeded bit refrence when enabling jumbo frames
Alexander Duyck [Tue, 25 Nov 2008 09:03:26 +0000 (01:03 -0800)] 
igb: remove unneeded bit refrence when enabling jumbo frames

There is a reference to a Buffer Size extention bit that is unneded by
82575/82576 hardware.  Since it is not needed it should be removed from the
code.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoDCB: fix kconfig option
Jeff Kirsher [Tue, 25 Nov 2008 09:02:08 +0000 (01:02 -0800)] 
DCB: fix kconfig option

Since the netlink option for DCB is necessary to actually be useful,
simplified the Kconfig option.  In addition, added useful help text for the
Kconfig option.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoaoe: remove private mac address format function
Harvey Harrison [Tue, 25 Nov 2008 08:40:37 +0000 (00:40 -0800)] 
aoe: remove private mac address format function

Add %pm to omit the colons when printing a mac address.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: Hook up ->reset_resume
Denis Joseph Barrow [Tue, 25 Nov 2008 08:36:10 +0000 (00:36 -0800)] 
hso: Hook up ->reset_resume

Made usb_drivers reset_resume function point to hso_resume this
fixes problems a usb reset is done when the network interface
is left idle for a few minutes. Possibly reset_resume should
initialise hardware more but this works in the common case.

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: Add TIOCM ioctl handling.
Denis Joseph Barrow [Tue, 25 Nov 2008 08:35:26 +0000 (00:35 -0800)] 
hso: Add TIOCM ioctl handling.

Makes TIOCM ioctls for Data Carrier Detect & related functions
work like /drivers/serial/serial-core.c potentially needed
for pppd & similar user programs.

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: Fix free of mutexes still in use.
Denis Joseph Barrow [Tue, 25 Nov 2008 08:33:13 +0000 (00:33 -0800)] 
hso: Fix free of mutexes still in use.

A new structure hso_mutex_table had to be declared statically
& used as as hso_device mutex_lock(&serial->parent->mutex) etc
is freed in hso_serial_open & hso_serial_close by kref_put while
the mutex is still in use.

This is a substantial change but should make the driver much stabler.

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: Fix URB submission -EINVAL.
Denis Joseph Barrow [Tue, 25 Nov 2008 08:30:48 +0000 (00:30 -0800)] 
hso: Fix URB submission -EINVAL.

Added check for IFF_UP in hso_resume, this should eliminate -EINVAL (-22)
errors caused from urb's being submitted twice, once by hso_resume
& once in hso_net_open, if suspend/resume USB power saving  mode is enabled

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: Fix crashes on close.
Denis Joseph Barrow [Tue, 25 Nov 2008 08:27:50 +0000 (00:27 -0800)] 
hso: Fix crashes on close.

Moved serial_open_count in hso_serial_open to
prevent crashes owing to the serial structure being made NULL
when hso_serial_close is called even though hso_serial_open
returned -ENODEV, Alan Cox pointed out this happens,
also put in sanity check in hso_serial_close
to check for a valid serial structure which should prevent
the most reproducable crash in the driver when the hso device
is disconnected while in use.

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agohso: Add new usb device id's.
Denis Joseph Barrow [Tue, 25 Nov 2008 08:26:12 +0000 (00:26 -0800)] 
hso: Add new usb device id's.

Signed-off-by: Denis Joseph Barrow <D.Barow@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetdev: add HAVE_NET_DEVICE_OPS
Stephen Hemminger [Tue, 25 Nov 2008 08:20:43 +0000 (00:20 -0800)] 
netdev: add HAVE_NET_DEVICE_OPS

As a concession to vendors who have to deal with one source for different
kernel versions, add a HAVE_NET_DEVICE_OPS so they don't end up hard
coding ifdef against kernel version.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: handle shift/merge of cloned skbs too
Ilpo Järvinen [Tue, 25 Nov 2008 05:30:21 +0000 (21:30 -0800)] 
tcp: handle shift/merge of cloned skbs too

This caused me to get repeatably:

  tcpdump: pcap_loop: recvfrom: Bad address

Happens occassionally when I tcpdump my for-looped test xfers:
  while [ : ]; do echo -n "$(date '+%s.%N') "; ./sendfile; sleep 20; done

Rest of the relevant commands:
  ethtool -K eth0 tso off
  tc qdisc add dev eth0 root netem drop 4%
  tcpdump -n -s0 -i eth0 -w sacklog.all

Running net-next under kvm, connection goes to the same host
(basically just out of kvm). The connection itself works ok
and data gets sent without corruption even with a large
number of tests while tcpdump fails usually within less than
5 tests.

Whether it only happens because of this change or not, I
don't know for sure but it's the only thing with which
I've seen that error. The non-cloned variant works w/o it
for much longer time. I'm yet to debug where the error
actually comes from.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: add some mibs to track collapsing
Ilpo Järvinen [Tue, 25 Nov 2008 05:27:22 +0000 (21:27 -0800)] 
tcp: add some mibs to track collapsing

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: Make shifting not clear the hints
Ilpo Järvinen [Tue, 25 Nov 2008 05:26:56 +0000 (21:26 -0800)] 
tcp: Make shifting not clear the hints

The earlier version was just very basic one which is "playing
safe" by always clearing the hints. However, clearing of a hint
is extremely costly operation with large windows, so it must be
avoided at all cost whenever possible, there is a way with
shifting too achieve not-clearing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: Try to restore large SKBs while SACK processing
Ilpo Järvinen [Tue, 25 Nov 2008 05:20:15 +0000 (21:20 -0800)] 
tcp: Try to restore large SKBs while SACK processing

During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.

This approach has a number of benefits:

1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
   which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
   of TCP has to be aware of this (this was not the case with
   some other approach that was, well, quite intrusive all
   around).
4) Clean_rtx_queue can release most of the pages using single
   put_page instead of previous PAGE_SIZE/mss+1 calls

In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.

TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).

I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
  1) ambiguous => no sensible measurement can be taken anyway
  2) non-ambiguous is due to reordering => having the timestamp
     of the last segment there is just skewing things more off
     than does some good since the ack got triggered by one of
     the holes (besides some substle issues that would make
     determining right hole/skb even harder problem). Anyway,
     it has nothing to do with this change then.

I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.

In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.

Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.

TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)

My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...

[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]

...To counter that, I gave up on the fifth advantage:

5) When growing previous SACK block, less allocs for new skbs
   are done, basically a new alloc is needed only when new hole
   is detected and when the previous skb runs out of frags space

...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.

With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:

                  TCPSackShifted 398
                   TCPSackMerged 877
            TCPSackShiftFallback 320
      TCPSACKCOLLAPSEFALLBACKGSO 0
  TCPSACKCOLLAPSEFALLBACKSKBBITS 0
  TCPSACKCOLLAPSEFALLBACKSKBDATA 0
    TCPSACKCOLLAPSEFALLBACKBELOW 0
    TCPSACKCOLLAPSEFALLBACKFIRST 1
 TCPSACKCOLLAPSEFALLBACKPREVBITS 318
      TCPSACKCOLLAPSEFALLBACKMSS 1
   TCPSACKCOLLAPSEFALLBACKNOHEAD 0
    TCPSACKCOLLAPSEFALLBACKSHIFT 0
          TCPSACKCOLLAPSENOOPSEQ 0
  TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
     TCPSACKCOLLAPSENOOPSMALLLEN 0
             TCPSACKCOLLAPSEHOLE 12

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agotcp: make tcp_sacktag_one able to handle partial skb too
Ilpo Järvinen [Tue, 25 Nov 2008 05:14:43 +0000 (21:14 -0800)] 
tcp: make tcp_sacktag_one able to handle partial skb too

This is preparatory work for SACK combiner patch which may
have to count TCP state changes for only a part of the skb
because it will intentionally avoids splitting skb to SACKed
and not sacked parts.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>