linux-2.6
17 years ago[NET]: Avoid copying TCP packets unnecessarily
Herbert Xu [Mon, 15 Oct 2007 08:47:15 +0000 (01:47 -0700)] 
[NET]: Avoid copying TCP packets unnecessarily

TCP packets all have writable heads, that is, even though it's cloned, it is
writable up to the end of the TCP header.  This patch makes skb_checksum_help
aware of this fact by using skb_clone_writable and avoiding a copy for TCP.

I've also modified the BUG_ON tests to be unsigned.  The only case where this
makes a difference is if csum_start points to a location before skb->data.
Since skb->data should always include the header where the checksum field
is (and all currently callers adhere to that), this change is safe and may
uncover bugs later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Fix csum_start update in pskb_expand_head
Herbert Xu [Mon, 15 Oct 2007 08:46:08 +0000 (01:46 -0700)] 
[NET]: Fix csum_start update in pskb_expand_head

I got confused by the dual nature of the off variable in the
function pskb_expand_head.  The csum_start offset should use
nhead instead of off which can change depending on whether we
are using offsets or pointers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NIU]: getting rid of __ucmpdi2 in niu.o
Al Viro [Mon, 15 Oct 2007 08:42:31 +0000 (01:42 -0700)] 
[NIU]: getting rid of __ucmpdi2 in niu.o

By the time we get to that switch by PHY type, we have 8bit
value.  No need to keep it in u64 when u8 would do.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETLINK]: Don't leak 'listeners' in netlink_kernel_create()
Jesper Juhl [Mon, 15 Oct 2007 08:39:12 +0000 (01:39 -0700)] 
[NETLINK]: Don't leak 'listeners' in netlink_kernel_create()

The Coverity checker spotted that we'll leak the storage allocated
to 'listeners' in netlink_kernel_create() when the
  if (!nl_table[unit].registered)
check is false.

This patch avoids the leak.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6] __inet6_csk_dst_store(): fix check-after-use
Adrian Bunk [Mon, 15 Oct 2007 08:37:55 +0000 (01:37 -0700)] 
[IPV6] __inet6_csk_dst_store(): fix check-after-use

The Coverity checker spotted that we have already oops'ed if "dst" was
NULL.

Since "dst" being NULL doesn't seem to be possible at this point this
patch removes the NULL check.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NIU]: Fix write past end of array in niu_pci_probe_sprom().
David S. Miller [Mon, 15 Oct 2007 08:36:24 +0000 (01:36 -0700)] 
[NIU]: Fix write past end of array in niu_pci_probe_sprom().

Noticed by Coverity checker and reported by Adrian Bunk.

Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6]: Avoid skb_copy/pskb_copy/skb_realloc_headroom on input
Herbert Xu [Mon, 15 Oct 2007 08:29:10 +0000 (01:29 -0700)] 
[IPV6]: Avoid skb_copy/pskb_copy/skb_realloc_headroom on input

This patch replaces unnecessary uses of skb_copy by pskb_expand_head
on the IPv6 input path.

This allows us to remove the double pointers later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV6]: Make ipv6_frag_rcv return the same packet
Herbert Xu [Mon, 15 Oct 2007 08:28:47 +0000 (01:28 -0700)] 
[IPV6]: Make ipv6_frag_rcv return the same packet

This patch implements the same change taht was done to ip_defrag.  It
makes ipv6_frag_rcv return the last packet received of a train of fragments
rather than the head of that sequence.

This allows us to get rid of the sk_buff ** argument later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: Replace sk_buff ** with sk_buff *
Herbert Xu [Mon, 15 Oct 2007 07:53:15 +0000 (00:53 -0700)] 
[NETFILTER]: Replace sk_buff ** with sk_buff *

With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom
Herbert Xu [Sun, 14 Oct 2007 07:39:55 +0000 (00:39 -0700)] 
[NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom

This patch replaces unnecessary uses of skb_copy, pskb_copy and
skb_realloc_headroom by functions such as skb_make_writable and
pskb_expand_head.

This allows us to remove the double pointers later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPVS]: Replace local version of skb_make_writable
Herbert Xu [Sun, 14 Oct 2007 07:39:33 +0000 (00:39 -0700)] 
[IPVS]: Replace local version of skb_make_writable

This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NETFILTER]: Do not copy skb in skb_make_writable
Herbert Xu [Sun, 14 Oct 2007 07:39:18 +0000 (00:39 -0700)] 
[NETFILTER]: Do not copy skb in skb_make_writable

Now that all callers of netfilter can guarantee that the skb is not shared,
we no longer have to copy the skb in skb_make_writable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[BRIDGE]: Unshare skb upon entry
Herbert Xu [Sun, 14 Oct 2007 07:39:01 +0000 (00:39 -0700)] 
[BRIDGE]: Unshare skb upon entry

Due to the special location of the bridging hook, it should never see a
shared packet anyway (certainly not with any in-kernel code).  So it
makes sense to unshare the skb there if necessary as that will greatly
simplify the code below it (in particular, netfilter).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[NET]: Avoid unnecessary cloning for ingress filtering
Herbert Xu [Sun, 14 Oct 2007 07:38:47 +0000 (00:38 -0700)] 
[NET]: Avoid unnecessary cloning for ingress filtering

As it is we always invoke pt_prev before ing_filter, even if there are no
ingress filters attached.  This can cause unnecessary cloning in pt_prev.

This patch changes it so that we only invoke pt_prev if there are ingress
filters attached.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4]: Change ip_defrag to return an integer
Herbert Xu [Sun, 14 Oct 2007 07:38:32 +0000 (00:38 -0700)] 
[IPV4]: Change ip_defrag to return an integer

Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead.  This patch does
that and updates all its callers accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[IPV4]: Make ip_defrag return the same packet
Herbert Xu [Sun, 14 Oct 2007 07:38:15 +0000 (00:38 -0700)] 
[IPV4]: Make ip_defrag return the same packet

This patch is a bit of a hack.  However it is worth it if you consider that
this is the only reason why we have to carry around the struct sk_buff **
pointers in netfilter.

It makes ip_defrag always return the packet that was given to it on input.
It does this by cloning the packet and replacing its original contents with
the head fragment if necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SKBUFF]: Add skb_morph
Herbert Xu [Sun, 14 Oct 2007 07:37:52 +0000 (00:37 -0700)] 
[SKBUFF]: Add skb_morph

This patch creates a new function skb_morph that's just like skb_clone
except that it lets user provide the spare skb that will be overwritten
by the one that's to be cloned.

This will be used by IP fragment reassembly so that we get back the same
skb that went in last (rather than the head skb that we get now which
requires us to carry around double pointers all over the place).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years ago[SKBUFF]: Merge common code between copy_skb_header and skb_clone
Herbert Xu [Sun, 14 Oct 2007 07:37:30 +0000 (00:37 -0700)] 
[SKBUFF]: Merge common code between copy_skb_header and skb_clone

This patch creates a new function __copy_skb_header to merge the common
code between copy_skb_header and skb_clone.  Having two functions which
are largely the same is a source of wasted labour as well as confusion.

In fact the tc_verd stuff is almost certainly a bug since it's treated
differently in skb_clone compared to the callers of copy_skb_header
(skb_copy/pskb_copy/skb_copy_expand).

I've kept that difference in tact with a comment added asking for
clarification.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoMerge git://git.linux-nfs.org/pub/linux/nfs-2.6
Linus Torvalds [Mon, 15 Oct 2007 17:46:05 +0000 (10:46 -0700)] 
Merge git://git.linux-nfs.org/pub/linux/nfs-2.6

* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
  NFSv4: Fix a typo in nfs_inode_reclaim_delegation
  NFS: Add a boot parameter to disable 64 bit inode numbers
  NFS: nfs_refresh_inode should clear cache_validity flags on success
  NFS: Fix a connectathon regression in NFSv3 and NFSv4
  NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
  SUNRPC: Don't call xprt_release in call refresh
  SUNRPC: Don't call xprt_release() if call_allocate fails
  SUNRPC: Fix buggy UDP transmission
  [23/37] Clean up duplicate includes in
  [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
  SUNRPC: Use correct type in buffer length calculations
  SUNRPC: Fix default hostname created in rpc_create()
  nfs: add server port to rpc_pipe info file
  NFS: Get rid of some obsolete macros
  NFS: Simplify filehandle revalidation
  NFS: Ensure that nfs_link() returns a hashed dentry
  NFS: Be strict about dentry revalidation when doing exclusive create
  NFS: Don't zap the readdir caches upon error
  NFS: Remove the redundant nfs_reval_fsid()
  NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
  ...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c

17 years agoMerge branch 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peter...
Linus Torvalds [Mon, 15 Oct 2007 17:40:41 +0000 (10:40 -0700)] 
Merge branch 'v2.6.24-lockdep' of git://git./linux/kernel/git/peterz/linux-2.6-lockdep

* 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
  lockdep: annotate dir vs file i_mutex
  lockdep: per filesystem inode lock class
  lockdep: annotate kprobes irq fiddling
  lockdep: annotate rcu_read_{,un}lock{,_bh}
  lockdep: annotate journal_start()
  lockdep: s390: connect the sysexit hook
  lockdep: x86_64: connect the sysexit hook
  lockdep: i386: connect the sysexit hook
  lockdep: syscall exit check
  lockdep: fixup mutex annotations
  lockdep: fix mismatched lockdep_depth/curr_chain_hash
  lockdep: Avoid /proc/lockdep & lock_stat infinite output
  lockdep: maintainers

17 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Mon, 15 Oct 2007 16:57:54 +0000 (09:57 -0700)] 
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] update sn2_defconfig
  [IA64] Fix kernel hangup in kdump on INIT
  [IA64] Fix kernel panic in kdump on INIT
  [IA64] Remove vector from ia64_machine_kexec()
  [IA64] Fix race when multiple cpus go through MCA
  [IA64] Remove needless delay in MCA rendezvous
  [IA64] add driver for ACPI methods to call native firmware
  [IA64] abstract SAL_CALL wrapper to allow other firmware entry points
  [IA64] perfmon: Remove exit_pfm_fs()
  [IA64] tree-wide: Misc __cpu{initdata, init, exit} annotations

17 years agoGet rid of unused variable warning in drivers/pci/hotplug/pci_hotplug_core.c
Linus Torvalds [Mon, 15 Oct 2007 16:07:58 +0000 (09:07 -0700)] 
Get rid of unused variable warning in drivers/pci/hotplug/pci_hotplug_core.c

Commit 5a7ad7f044941316dc98eda2a087a12a7a50649d removed all uses of
'retval', but didn't remove the variable itself.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years ago[IA64] update sn2_defconfig
Jes Sorensen [Wed, 19 Sep 2007 09:54:55 +0000 (11:54 +0200)] 
[IA64] update sn2_defconfig

Update defonfig file for sn2 to match recent changes in config options.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
Linus Torvalds [Mon, 15 Oct 2007 15:22:16 +0000 (08:22 -0700)] 
Merge git://git./linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (140 commits)
  sched: sync wakeups preempt too
  sched: affine sync wakeups
  sched: guest CPU accounting: maintain guest state in KVM
  sched: guest CPU accounting: maintain stats in account_system_time()
  sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
  sched: guest CPU accounting: add guest-CPU /proc/stat field
  sched: domain sysctl fixes: add terminator comment
  sched: domain sysctl fixes: do not crash on allocation failure
  sched: domain sysctl fixes: unregister the sysctl table before domains
  sched: domain sysctl fixes: use for_each_online_cpu()
  sched: domain sysctl fixes: use kcalloc()
  Make scheduler debug file operations const
  sched: enable wake-idle on CONFIG_SCHED_MC=y
  sched: reintroduce topology.h tunings
  sched: allow the immediate migration of cache-cold tasks
  sched: debug, improve migration statistics
  sched: debug: increase width of debug line
  sched: activate task_hot() only on fair-scheduled tasks
  sched: reintroduce cache-hot affinity
  sched: speed up context-switches a bit
  ...

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
Linus Torvalds [Mon, 15 Oct 2007 15:19:33 +0000 (08:19 -0700)] 
Merge /pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
  [SCSI] gdth: fix CONFIG_ISA build failure
  [SCSI] esp_scsi: remove __dev{init,exit}
  [SCSI] gdth: !use_sg cleanup and use of scsi accessors
  [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
  [SCSI] gdth: Setup proper per-command private data
  [SCSI] gdth: Remove gdth_ctr_tab[]
  [SCSI] gdth: switch to modern scsi host registration
  [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
  [SCSI] gdth: clean up host private data
  [SCSI] gdth: Remove virt hosts
  [SCSI] gdth: Reorder scsi_host_template intitializers
  [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
  [SCSI] gdth: Remove 2.4.x support, in-kernel changelog
  [SCSI] gdth: split out pci probing
  [SCSI] gdth: split out eisa probing
  [SCSI] gdth: split out isa probing
  gdth: Make one abuse of scsi_cmnd less obvious
  [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
  [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
  [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
  ...

17 years agoMerge branch 'agp-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Mon, 15 Oct 2007 15:18:44 +0000 (08:18 -0700)] 
Merge branch 'agp-patches' of /linux/kernel/git/airlied/agp-2.6

* 'agp-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied/agp-2.6:
  fix use after free in amd create gatt pages
  AGP fix race condition between unmapping and freeing pages

17 years agoMerge branch 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlie...
Linus Torvalds [Mon, 15 Oct 2007 15:17:26 +0000 (08:17 -0700)] 
Merge branch 'drm-patches' of ssh:///linux/kernel/git/airlied/drm-2.6

* 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  via invalid device ids removal
  radeon: Commit the ring after each partial texture upload blit.
  i915: fix vbl swap allocation size.
  drm: Replace DRM_IOCTL_ARGS with (dev, data, file_priv) and remove DRM_DEVICE.
  drm: remove XFREE86_VERSION macros.
  drm: Replace filp in ioctl arguments with drm_file *file_priv.
  drm: Remove DRM_ERR OS macro.

17 years agoMerge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Mon, 15 Oct 2007 15:16:53 +0000 (08:16 -0700)] 
Merge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux

* 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux:
  knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
  knfsd: nfsv4 delegation recall should take reference on client
  knfsd: don't shutdown callbacks until nfsv4 client is freed
  knfsd: let nfsd manage timing out its own leases
  knfsd: Add source address to sunrpc svc errors
  knfsd: 64 bit ino support for NFS server
  svcgss: move init code into separate function
  knfsd: remove code duplication in nfsd4_setclientid()
  nfsd warning fix
  knfsd: fix callback rpc cred
  knfsd: move nfsv4 slab creation/destruction to module init/exit
  knfsd: spawn kernel thread to probe callback channel
  knfsd: nfs4 name->id mapping not correctly parsing negative downcall
  knfsd: demote some printk()s to dprintk()s
  knfsd: cleanup of nfsd4 cmp_* functions
  knfsd: delete code made redundant by map_new_errors
  nfsd: fix horrible indentation in nfsd_setattr
  nfsd: remove unused cache_for_each macro
  nfsd: tone down inaccurate dprintk

17 years agoPS3 system bus add_uevent_var() fallout
Geert Uytterhoeven [Mon, 15 Oct 2007 09:51:03 +0000 (11:51 +0200)] 
PS3 system bus add_uevent_var() fallout

Kill unused variables

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoHID: fix HIDIOCGRDESC memory access in hidraw
Jiri Kosina [Mon, 15 Oct 2007 13:17:41 +0000 (15:17 +0200)] 
HID: fix HIDIOCGRDESC memory access in hidraw

Fix bogus copying of data into userspace when HIDIOCGRDESC is issued.
HID-transport layer makes sure that dev->hid->rdesc is not larger than
HID_MAX_DESCRIPTOR_SIZE.

Noticed-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agosched: sync wakeups preempt too
Ingo Molnar [Mon, 15 Oct 2007 15:00:20 +0000 (17:00 +0200)] 
sched: sync wakeups preempt too

make sure sync wakeups preempt too - the scheduler will not
overschedule as we've got various throttles against that.
As a result, sync wakeups can be used more widely in the kernel
(to signal wakeup affinity between tasks), and no arbitrary
latencies will be introduced either.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: affine sync wakeups
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: affine sync wakeups

make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: maintain guest state in KVM
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: guest CPU accounting: maintain guest state in KVM

Modify KVM to update guest time accounting.

[ mingo@elte.hu: ported to 2.6.24 KVM. ]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: maintain stats in account_system_time()
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: guest CPU accounting: maintain stats in account_system_time()

modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields

like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.

Modify /proc/<pid>/stat to display these new fields.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: add guest-CPU /proc/stat field
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: guest CPU accounting: add guest-CPU /proc/stat field

as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: add terminator comment
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: domain sysctl fixes: add terminator comment

we had an incorrect-terminator bug in sd_alloc_ctl_domain_table()
before, so add a comment that documents it.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: do not crash on allocation failure
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: domain sysctl fixes: do not crash on allocation failure

Now that we are calling this at runtime, a more relaxed error path is
suggested.  If an allocation fails, we just register the partial table,
which will show empty directories.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: unregister the sysctl table before domains
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: domain sysctl fixes: unregister the sysctl table before domains

Unregister and free the sysctl table before destroying domains, then
rebuild and register after creating the new domains.  This prevents the
sysctl table from pointing to freed memory for root to write.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: use for_each_online_cpu()
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: domain sysctl fixes: use for_each_online_cpu()

init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu
variables.  If the cpus_possible mask is not contigious this will result
in a crash referencing unallocated data.  If the online mask is not
contigious then we would show offline cpus and miss online ones.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: use kcalloc()
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: domain sysctl fixes: use kcalloc()

kcalloc checks for n * sizeof(element) overflows and it zeros.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoMake scheduler debug file operations const
Arjan van de Ven [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
Make scheduler debug file operations const

In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: enable wake-idle on CONFIG_SCHED_MC=y
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: enable wake-idle on CONFIG_SCHED_MC=y

most multicore CPUs today have shared L2 caches, so tune things so
that the spreading amongst cores is more aggressive.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: reintroduce topology.h tunings
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)] 
sched: reintroduce topology.h tunings

reintroduce the 2.6.22 topology.h tunings again - they result in
slightly better balancing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: allow the immediate migration of cache-cold tasks
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: allow the immediate migration of cache-cold tasks

allow the immediate migration of cache-cold tasks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: debug, improve migration statistics
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: debug, improve migration statistics

add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: debug: increase width of debug line
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: debug: increase width of debug line

increase width of debug line - in preparation of more debugging info.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: activate task_hot() only on fair-scheduled tasks
Peter Zijlstra [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: activate task_hot() only on fair-scheduled tasks

activate task_hot() only for fair-scheduled tasks (i.e. disable it
for RT tasks).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: reintroduce cache-hot affinity
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: reintroduce cache-hot affinity

reintroduce a simplified version of cache-hot/cold scheduling
affinity. This improves performance with certain SMP workloads,
such as sysbench.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: speed up context-switches a bit
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: speed up context-switches a bit

speed up context-switches a bit by not clearing p->exec_start.

(as a side-effect, this also makes p->exec_start a universal timestamp
available to cache-hot estimations.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: do not wakeup-preempt with SCHED_BATCH tasks
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: do not wakeup-preempt with SCHED_BATCH tasks

do not wakeup-preempt with SCHED_BATCH tasks, their preemption
is batched too, driven by the tick.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: generate uevents for user creation/destruction
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: generate uevents for user creation/destruction

Generate uevents when a user is being created/destroyed. These events
can be used to configure cpu share of a new user.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: do not normalize kernel threads via SysRq-N
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: do not normalize kernel threads via SysRq-N

do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.

pointed out by Andi Kleen.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: remove stale comment from sched_group_set_shares()
Andi Kleen [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)] 
sched: remove stale comment from sched_group_set_shares()

remove stale comment from sched_group_set_shares().

Function never returns -EINVAL.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up is_migration_thread()
Ingo Molnar [Mon, 15 Oct 2007 15:00:15 +0000 (17:00 +0200)] 
sched: clean up is_migration_thread()

clean up is_migration_thread() and turn it into an inline function.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: refactor normalize_rt_tasks
Andi Kleen [Mon, 15 Oct 2007 15:00:15 +0000 (17:00 +0200)] 
sched: cleanup: refactor normalize_rt_tasks

Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier to read.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: refactor common code of sleep_on / wait_for_completion
Andi Kleen [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: cleanup: refactor common code of sleep_on / wait_for_completion

Refactor common code of sleep_on / wait_for_completion

These functions were largely cut'n'pasted. This moves
the common code into single helpers instead.  Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: remove unnecessary gotos
Andi Kleen [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: cleanup: remove unnecessary gotos

Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;

No functional changes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: update comment
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: update comment

update comment: clarify time-slices and remove obsolete tuning detail.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: prevent wakeup over-scheduling
Mike Galbraith [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: prevent wakeup over-scheduling

Prevent wakeup over-scheduling.  Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept.  Instead, the
task is marked for preemption at the next tick.  Tasks of higher
priority still preempt immediately.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: disable forced preemption by default
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: disable forced preemption by default

Implement feature bit to disable forced preemption. This way
it can be checked whether a workload is overscheduling or not.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix group scheduling for SCHED_BATCH
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: fix group scheduling for SCHED_BATCH

The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the
possibility of 'p' being always a valid address.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: some proc entries are missed in sched_domain sys_ctl debug code
Zou Nan hai [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: some proc entries are missed in sched_domain sys_ctl debug code

cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.

This patch fixes the issue.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix rt ptracer monopolizing CPU
Gautham R Shenoy [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: fix rt ptracer monopolizing CPU

yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue.

Use schedule_timeout_uninterruptible(1) instead.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Credit: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: group scheduling, sysfs tunables
Dhaval Giani [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: group scheduling, sysfs tunables

Add tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: disable sleeper_fairness on SCHED_BATCH
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: disable sleeper_fairness on SCHED_BATCH

disable sleeper fairness for batch tasks - they are about
batch processing after all.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: another wakeup_granularity fix
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: another wakeup_granularity fix

unit mis-match: wakeup_gran was used against a vruntime

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: export cpu_clock()
Paul E. McKenney [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: export cpu_clock()

export cpu_clock() - the preferred API instead of sched_clock().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix: move the CPU check into ->task_new_fair()
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: fix: move the CPU check into ->task_new_fair()

noticed by Peter Zijlstra:

fix: move the CPU check into ->task_new_fair(), this way we
can call place_entity() and get child ->vruntime right at
initial wakeup time.

(without this there can be large latencies)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: cleanup: function prototype cleanups
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: cleanup: function prototype cleanups

noticed by Thomas Gleixner:

cleanup: function prototype cleanups - move into single line
wherever possible.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: rename task_grp to task_group
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)] 
sched: cleanup: rename task_grp to task_group

cleanup: rename task_grp to task_group. No need to save two characters
and 'grp' is annoying to read.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG

cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG, to
make SCHED_FEAT_ names more consistent.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: kfree(NULL) is valid
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: kfree(NULL) is valid

kfree(NULL) is valid.

pointed out by checkpatch.pl.

the fix shrinks the code a bit:

   text    data     bss     dec     hex filename
  40024    3842     100   43966    abbe sched.o.before
  40002    3842     100   43944    aba8 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: style cleanup
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: style cleanup

fix up __setup() style bug - noticed via checkpatch.pl.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: break out if printing a warning in sched_domain_debug()
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: break out if printing a warning in sched_domain_debug()

checkpatch.pl and Andy Whitcroft noticed the following bug: we did
not break out after printing an error.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y

run sched_domain_debug() if CONFIG_SCHED_DEBUG=y, instead
of relying on the hand-crafted SCHED_DOMAIN_DEBUG switch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup, remove the TASK_NONINTERACTIVE flag
Mike Galbraith [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: cleanup, remove the TASK_NONINTERACTIVE flag

Here's another piece of low hanging obsolete fruit.

Remove obsolete TASK_NONINTERACTIVE.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup, make dequeue_entity() and update_stats_wait_end() similar
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar

make dequeue_entity() / enqueue_entity() and update_stats_dequeue() /
update_stats_enqueue() look similar, structure-wise.

zero effect, functionality-wise:

   text    data     bss     dec     hex filename
  34550    3026     100   37676    932c sched.o.before
  34550    3026     100   37676    932c sched.o.after

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup, remove calc_weighted()
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: cleanup, remove calc_weighted()

remove obsolete code -- calc_weighted()

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: tidy up SCHED_RR
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: tidy up SCHED_RR

- make timeslices of SCHED_RR tasks constant and not
dependent on task's static_prio [1] ;
- remove obsolete code (timeslice related bits);
- make sched_rr_get_interval() return something more
meaningful [2] for SCHED_OTHER tasks.

[1] according to the following link, it's not compliant with SUSv3
(not sure though, what is the reference for us :-)
http://lkml.org/lkml/2007/3/7/656

[2] the interval is dynamic and can be depicted as follows "should a
task be one of the runnable tasks at this particular moment, it would
expect to run for this interval of time before being re-scheduled by the
scheduler tick".
(i.e. it's more precise if a task is runnable at the moment)

yeah, this seems to require task_rq_lock/unlock() but this is not a hot
path.

results:

(SCHED_FIFO)

dimm@earth:~/storage/prog$ sudo chrt -f 10 ./rr_interval
time_slice: 0 : 0

(SCHED_RR)

dimm@earth:~/storage/prog$ sudo chrt 10 ./rr_interval
time_slice: 0 : 99984800

(SCHED_NORMAL)

dimm@earth:~/storage/prog$ ./rr_interval
time_slice: 0 : 19996960

(SCHED_NORMAL + a cpu_hog of similar 'weight' on the same CPU --- so should be a half of the previous result)

dimm@earth:~/storage/prog$ taskset 1 ./rr_interval
time_slice: 0 : 9998480

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: uninline scheduler
Alexey Dobriyan [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: uninline scheduler

* save ~300 bytes
* activate_idle_task() was moved to avoid a warning

bloat-o-meter output:

add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295) <===
function                                     old     new   delta
__enqueue_entity                               -     165    +165
finish_task_switch                             -     110    +110
update_curr_rt                                 -      79     +79
__load_balance_iterator                        -      32     +32
__task_rq_unlock                               -      28     +28
find_process_by_pid                            -      24     +24
do_sched_setscheduler                        133     123     -10
sys_sched_rr_get_interval                    176     165     -11
sys_sched_getparam                           156     145     -11
normalize_rt_tasks                           482     470     -12
sched_getaffinity                            112      99     -13
sys_sched_getscheduler                        86      72     -14
sched_setaffinity                            226     212     -14
sched_setscheduler                           666     642     -24
load_balance_start_fair                       33       9     -24
load_balance_next_fair                        33       9     -24
dequeue_task_rt                              133      67     -66
put_prev_task_rt                              97      28     -69
schedule_tail                                133      50     -83
schedule                                     682     594     -88
enqueue_entity                               499     366    -133
task_new_fair                                317     180    -137

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: tweak wakeup granularity
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: tweak wakeup granularity

tweak wakeup granularity.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: optimize schedule() a bit on SMP
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: optimize schedule() a bit on SMP

optimize schedule() a bit on SMP, by moving the rq-clock update
outside the rq lock.

code size is the same:

      text    data     bss     dec     hex filename
     25725    2666      96   28487    6f47 sched.o.before
     25725    2666      96   28487    6f47 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: fix __pick_next_entity()
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: fix __pick_next_entity()

The thing is that __pick_next_entity() must never be called when
first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node'
be the very first field of 'struct sched_entity' (and it's the second).

The 'nr_running != 0' check is _not_ enough, due to the fact that
'current' is not within the tree. Generic paths are ok (e.g. schedule()
as put_prev_task() is called previously)... I'm more worried about e.g.
migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if
'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL
tasks (or imagine, some other use-cases in the future -- i.e. we should
not make outer world dependent on internal details of sched_fair class)
-- it may be "Houston, we've got a problem" case.

it's +16 bytes to the ".text". Another variant is to make 'run_node' the
first data member of 'struct sched_entity' but an additional check (se !
= NULL) is still needed in pick_next_entity().

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: vslice fixups for non-0 nice levels
Ingo Molnar [Mon, 15 Oct 2007 15:00:13 +0000 (17:00 +0200)] 
sched: vslice fixups for non-0 nice levels

Make vslice accurate wrt nice levels, and add some comments
while we're at it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: whitespace cleanups
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: whitespace cleanups

more whitespace cleanups. No code changed:

      text    data     bss     dec     hex filename
     26553    2790     288   29631    73bf sched.o.before
     26553    2790     288   29631    73bf sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: mark scheduling classes as const
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: mark scheduling classes as const

mark scheduling classes as const. The speeds up the code
a bit and shrinks it:

   text    data     bss     dec     hex filename
  40027    4018     292   44337    ad31 sched.o.before
  40190    3842     292   44324    ad24 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler, fix latency
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: group scheduler, fix latency

There is a possibility that because of task of a group moving from one
cpu to another, it may gain more cpu time that desired. See
http://marc.info/?l=linux-kernel&m=119073197730334 for details.

This is an attempt to fix that problem. Basically it simulates dequeue
of higher level entities as if they are going to sleep. Similarly it
simulate wakeup of higher level entities as if they are waking up from
sleep.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler, fix bloat
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: group scheduler, fix bloat

Recent fix to check_preempt_wakeup() to check for preemption at higher
levels caused a size bloat for !CONFIG_FAIR_GROUP_SCHED.

Fix the problem.

  42277   10598     320   53195    cfcb kernel/sched.o-before_this_patch
  42216   10598     320   53134    cf8e kernel/sched.o-after_this_patch

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler, fix coding style issues
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: group scheduler, fix coding style issues

Fix coding style issues reported by Randy Dunlap and others

Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: cleanup, remove stale comment
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: cleanup, remove stale comment

cleanup, remove stale comment.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: speed up and simplify vslice calculations
Peter Zijlstra [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: speed up and simplify vslice calculations

speed up and simplify vslice calculations.

[ From: Mike Galbraith <efault@gmx.de>: build fix ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up min_vruntime use
Peter Zijlstra [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: clean up min_vruntime use

clean up min_vruntime use.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: group scheduler SMP migration fix
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: group scheduler SMP migration fix

group scheduler SMP migration fix: use task_cfs_rq(p) to get
to the relevant fair-scheduling runqueue of a task, rq->cfs
is not the right one.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up schedstats, cnt -> count
Ingo Molnar [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: clean up schedstats, cnt -> count

rename all 'cnt' fields and variables to the less yucky 'count' name.

yuckage noticed by Andrew Morton.

no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:

   text    data     bss     dec     hex filename
  38236    3506      24   41766    a326 sched.o.before
  38240    3506      24   41770    a32a sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: yield fix
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: yield fix

fix yield bugs due to the current-not-in-rbtree changes: the task is
not in the rbtree so rbtree-removal is a no-no.

[ From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>: build fix. ]

also, nice code size reduction:

kernel/sched.o:
   text    data     bss     dec     hex filename
  38323    3506      24   41853    a37d sched.o.before
  38236    3506      24   41766    a326 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: group scheduler wakeup latency fix
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:12 +0000 (17:00 +0200)] 
sched: group scheduler wakeup latency fix

group scheduler wakeup latency fix: when checking for preemption
we must check cross-group too, not just intra-group.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: remove set_leftmost()
Ingo Molnar [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)] 
sched: remove set_leftmost()

Lee Schermerhorn noticed that set_leftmost() contains dead code,
remove this.

Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: clean up sched_fork()
Hiroshi Shimamoto [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)] 
sched: clean up sched_fork()

The adjusting sched_class is a missing part of the already existing "do
not leak PI boosting priority to the child" at the sched_fork(). This
patch moves the adjusting sched_class from wake_up_new_task() to
sched_fork().

this also shrinks the code a bit:

   text    data     bss     dec     hex filename
  40111    4018     292   44421    ad85 sched.o.before
  40102    4018     292   44412    ad7c sched.o.after

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
17 years agosched: max_vruntime() simplification
Peter Zijlstra [Mon, 15 Oct 2007 15:00:11 +0000 (17:00 +0200)] 
sched: max_vruntime() simplification

max_vruntime() simplification.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>