git.oblomov.eu Git - linux-2.6/log

Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (26 commits)
  firewire: fw-sbp2: Use sbp2 device-provided mgt orb timeout for logins
  firewire: fw-sbp2: increase login orb reply timeout, fix "failed to login"
  firewire: replace subtraction with bitwise and
  firewire: fw-core: react on bus resets while the config ROM is being fetched
  firewire: enforce access order between generation and node ID, fix "giving up on config rom"
  firewire: fw-cdev: use device generation, not card generation
  firewire: fw-sbp2: use device generation, not card generation
  firewire: fw-sbp2: try to increase reconnect_hold (speed up reconnection)
  firewire: fw-sbp2: skip unnecessary logout
  firewire vs. ieee1394: clarify MAINTAINERS
  firewire: fw-ohci: Dynamically allocate buffers for DMA descriptors
  firewire: fw-ohci: CycleTooLong interrupt management
  firewire: Fix extraction of source node id
  firewire: fw-ohci: Bug fixes for packet-per-buffer support
  firewire: fw-ohci: Fix for dualbuffer three-or-more buffers
  firewire: fw-sbp2: remove unused misleading macro
  firewire: fw-sbp2: prepare for s/g chaining
  firewire: fw-sbp2: refactor workq and kref handling
  ieee1394: ohci1394: don't schedule IT tasklets on IR events
  ieee1394: sbp2: raise default transfer size limit
  ...

Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] use SGI_HAS_INDYDOG for INDYDOG depends

Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
  lguest: use __PAGE_KERNEL instead of _PAGE_KERNEL
  lguest: Use explicit includes rateher than indirect
  lguest: get rid of lg variable assignments
  lguest: change gpte_addr header
  lguest: move changed bitmap to lg_cpu
  lguest: move last_pages to lg_cpu
  lguest: change last_guest to last_cpu
  lguest: change spte_addr header
  lguest: per-vcpu lguest pgdir management
  lguest: make pending notifications per-vcpu
  lguest: makes special fields be per-vcpu
  lguest: per-vcpu lguest task management
  lguest: replace lguest_arch with lg_cpu_arch.
  lguest: make registers per-vcpu
  lguest: make emulate_insn receive a vcpu struct.
  lguest: map_switcher_in_guest() per-vcpu
  lguest: per-vcpu interrupt processing.
  lguest: per-vcpu lguest timers
  lguest: make hypercalls use the vcpu struct
  lguest: make write() operation smp aware
  ...

Manual conflict resolved (maybe even correctly, who knows) in
drivers/lguest/x86/core.c

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/selinux-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  security: compile capabilities by default
  selinux: make selinux_set_mnt_opts() static
  SELinux: Add warning messages on network denial due to error
  SELinux: Add network ingress and egress control permission checks
  NetLabel: Add auditing to the static labeling mechanism
  NetLabel: Introduce static network labels for unlabeled connections
  SELinux: Allow NetLabel to directly cache SIDs
  SELinux: Enable dynamic enable/disable of the network access checks
  SELinux: Better integration between peer labeling subsystems
  SELinux: Add a new peer class and permissions to the Flask definitions
  SELinux: Add a capabilities bitmap to SELinux policy version 22
  SELinux: Add a network node caching mechanism similar to the sel_netif_*() functions
  SELinux: Only store the network interface's ifindex
  SELinux: Convert the netif code to use ifindex values
  NetLabel: Add IP address family information to the netlbl_skbuff_getattr() function
  NetLabel: Add secid token support to the NetLabel secattr struct
  NetLabel: Consolidate the LSM domain mapping/hashing locks
  NetLabel: Cleanup the LSM domain hash functions
  NetLabel: Remove unneeded RCU read locks

Merge git://git./linux/kernel/git/gregkh/driver-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
  PPC: Fix powerpc vio_find_name to not use devices_subsys
  Driver core: add bus_find_device_by_name function
  Module: check to see if we have a built in module with the same name
  x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
  Driver core: Fix up build when CONFIG_BLOCK=N

Merge branch 'for-linus' of git://git./linux/kernel/git/avi/kvm

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (249 commits)
  KVM: Move apic timer migration away from critical section
  KVM: Put kvm_para.h include outside __KERNEL__
  KVM: Fix unbounded preemption latency
  KVM: Initialize the mmu caches only after verifying cpu support
  KVM: MMU: Fix dirty page setting for pages removed from rmap
  KVM: Portability: Move kvm_fpu to asm-x86/kvm.h
  KVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UD
  KVM: MMU: Merge shadow level check in FNAME(fetch)
  KVM: MMU: Move kvm_free_some_pages() into critical section
  KVM: MMU: Switch to mmu spinlock
  KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()
  KVM: Add kvm_read_guest_atomic()
  KVM: MMU: Concurrent guest walkers
  KVM: Disable vapic support on Intel machines with FlexPriority
  KVM: Accelerated apic support
  KVM: local APIC TPR access reporting facility
  KVM: Print data for unimplemented wrmsr
  KVM: MMU: Add cache miss statistic
  KVM: MMU: Coalesce remote tlb flushes
  KVM: Expose ioapic to ia64 save/restore APIs
  ...

Merge branch 'for-linus' of git://git./linux/kernel/git/teigland/dlm

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: (21 commits)
  dlm: static initialization improvements
  dlm: clean ups
  dlm: Sanity check namelen before copying it
  dlm: keep cached master rsbs during recovery
  dlm: change error message to debug
  dlm: fix possible use-after-free
  dlm: limit dir lookup loop
  dlm: reject normal unlock when lock is waiting for lookup
  dlm: validate messages before processing
  dlm: reject messages from non-members
  dlm: another call to confirm_master in receive_request_reply
  dlm: recover locks waiting for overlap replies
  dlm: clear ast_type when removing from astqueue
  dlm: use fixed errno values in messages
  dlm: swap bytes for rcom lock reply
  dlm: align midcomms message buffer
  dlm: close othercons
  dlm: use dlm prefix on alloc and free functions
  dlm: don't print common non-errors
  dlm: proper prototypes
  ...

Merge git://git./linux/kernel/git/jejb/scsi-misc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (21 commits)
  [SCSI] Revert "[SCSI] aacraid: fib context lock for management ioctls"
  [SCSI] bsg: copy the cmd_type field to the subordinate request for bidi
  [SCSI] handle scsi_init_queue failure properly
  [SCSI] destroy scsi_bidi_sdb_cache in scsi_exit_queue
  [SCSI] scsi_debug: add XDWRITEREAD_10 support
  [SCSI] scsi_debug: add bidi data transfer support
  [SCSI] scsi_debug: add get_data_transfer_info helper function
  [SCSI] remove use_sg_chaining
  [SCSI] bidirectional: fix up for the new blk_end_request code
  [SCSI] bidirectional command support
  [SCSI] implement scsi_data_buffer
  [SCSI] tgt: use scsi_init_io instead of scsi_alloc_sgtable
  [SCSI] aic7xxx: fix warnings with CONFIG_PM disabled
  [SCSI] aic79xx: fix warnings with CONFIG_PM disabled
  [SCSI] aic7xxx: fix ahc_done check SCB_ACTIVE for tagged transactions
  [SCSI] sgiwd93: use cached memory access to make driver work on IP28
  [SCSI] zfcp: fix sense_buffer access bug
  [SCSI] ncr53c8xx: fix sense_buffer access bug
  [SCSI] aic79xx: fix sense_buffer access bug
  [SCSI] hptiop: fix sense_buffer access bug
  ...

docbook: fix block api fatal error

Fix docbook fatal error:
docproc: linux-2.6.24-git8/block/ll_rw_blk.c: No such file or directory

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

docbook: fix drivers/base/class warning

Fix kernel-doc empty line warning:
Warning(linux-2.6.24-git8//drivers/base/class.c:866): bad line:

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

firewire: fw-sbp2: Use sbp2 device-provided mgt orb timeout for logins

To be more compliant with section 7.4.8 of the SBP-2 specification,
use the mgt_ORB_timeout specified in the SBP-2 device's config rom
for login ORB attempts (though with some sanity checks). A happy
side-effect is that certain device and controller combinations that
sometimes take more than 20 seconds to get synced up (like my laptop
with just about any SBP-2 device) now function more reliably.

Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (silenced sparse)

firewire: fw-sbp2: increase login orb reply timeout, fix "failed to login"

Increase (and rename) the login orb reply timeout value to 20s
to match that of the old firewire stack. 2s simply didn't give
many devices enough time to spin up and reply.

Fixes inability to recognize some devices.
Failure mode was "orb reply timed out"/"failed to login".

Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (style, comments, changelog)

firewire: replace subtraction with bitwise and

Replace an unnecessary subtraction with a bitwise AND when determining the
value of ext_tcode in fw_fill_transaction() to save a cpu cycle or two in a
somewhat critical path.

Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-core: react on bus resets while the config ROM is being fetched

read_rom() obtained a fresh new fw_device.generation for each read
transaction.  Hence it was able to continue reading in the middle of the
ROM even if a bus reset happened.  However the device may have modified
the ROM during the reset.  We would end up with a corrupt fetched ROM
image then.

Although all of this is quite unlikely, it is not impossible.
Therefore we now restart reading the ROM if the bus generation changed.

Note, the memory barrier in read_rom() is still necessary according to
tests by Jarod Wilson, despite of the ->generation access being moved up
in the call chain.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
This is essentially what I've been beating on locally, and I've yet to hit
another config rom read failure with it.

Signed-off-by: Jarod Wilson <jwilson@redhat.com>

firewire: enforce access order between generation and node ID, fix "giving up on config rom"

fw_device.node_id and fw_device.generation are accessed without mutexes.
We have to ensure that all readers will get to see node_id updates
before generation updates.

Fixes an inability to recognize devices after "giving up on config rom",
https://bugzilla.redhat.com/show_bug.cgi?id=429950

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reviewed by Nick Piggin <nickpiggin@yahoo.com.au>.

Verified to fix 'giving up on config rom' issues on multiple system and
drive combinations that were previously affected.

Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>

firewire: fw-cdev: use device generation, not card generation

We have to use the fw_device.generation here, not the fw_card.generation,
because the generation must never be newer than the node ID when we emit
a transaction. This cannot be guaranteed with fw_card.generation.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Verified in concert with subsequent memory barriers patch to fix 'giving
up on config rom' issues on multiple system and drive combinations that
were previously affected.

Signed-off-by: Jarod Wilson <jwilson@redhat.com>

firewire: fw-sbp2: use device generation, not card generation

There was a small window where a login or reconnect job could use an
already updated card generation with an outdated node ID.  We have to
use the fw_device.generation here, not the fw_card.generation, because
the generation must never be newer than the node ID when we emit a
transaction.  This cannot be guaranteed with fw_card.generation.

Furthermore, the target's and initiator's node IDs can be obtained from
fw_device and fw_card.  Dereferencing their underlying topology objects
is not necessary.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Verified in concert with subsequent memory barriers patch to fix 'giving
up on config rom' issues on multiple system and drive combinations that
were previously affected.

Signed-off-by: Jarod Wilson <jwilson@redhat.com>

firewire: fw-sbp2: try to increase reconnect_hold (speed up reconnection)

Ask the target to grant 4 seconds instead of the standard and minimum of
1 second window after bus reset for reconnection.  This accelerates
reconnection if there are more than one targets on the bus:  If a login
and inquiry to one target blocks the fw-sbp2 workqueue for more than 1s
after bus reset, we now still can reconnect to the other target.

Before that, fw-sbp2's reconnect attempts would be rejected with "error
status: 0:9" (function rejected), and fw-sbp2 would finally re-login.
All those futile reconnect attemps cost extra time until the target
which needs re-login is ready for I/O again.

The reconnect timeout field in the login ORB doesn't have to be honored
by the target though.  I found that we could get up to
  - allegedly 32768s from an old OXFW911 firmware
  - 256s from LSI bridges
  - 4s from OXUF922 and OXFW912 bridges,
  - 2s from TI bridges,
  - only the standard 1s from Initio and Prolific bridges and from
    Apple OpenFirmware in target mode.

We just try to get 4 seconds which already covers the case of a few
HDDs on the same bus quite nicely.

A minor drawback occurs in the following (rare and impractical) border
case:
  - two initiators are there, initiator 1 holds an exclusive login to
    a target,
  - initiator 1 goes off the bus,
  - target refuses login attempts from initiator 2 until reconnect_hold
    seconds after bus reset.

An alternative approach to the issue at hand would be to parallelize
fw-sbp2's reconnect and login work.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Jarod Wilson <jwilson@redhat.com>

firewire: fw-sbp2: skip unnecessary logout

Don't attempt to send a logout ORB if the target was already unplugged
or had its link switched off. If two targets are attached, this
enhances the chance to quickly reconnect to the remaining target when
one target is plugged out.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Jarod Wilson <jwilson@redhat.com>

firewire vs. ieee1394: clarify MAINTAINERS

Maintainers like to receive less mail, and submitters like to have to Cc
less recipients.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-ohci: Dynamically allocate buffers for DMA descriptors

Previously, the fw-ohci driver used fixed-length buffers for storing
descriptors for isochronous receive DMA programs.  If an application
(such as libdc1394) generated a DMA program that was too large, fw-ohci
would reach the limit of its fixed-sized buffer and return an error to
userspace.

This patch replaces the fixed-length ring-buffer with a linked-list of
page-sized buffers.  Additional buffers can be dynamically allocated and
appended to the list when necessary.  For a particular context, buffers
are kept around after use and reused as necessary, so there is no
allocation taking place after the DMA program is generated for the first
time.

In addition, the buffers it uses are coherent for DMA so there is no
syncing required before and after writes.  This syncing wasn't properly
done in the previous version of the code.

-

This is the fourth version of my patch that replaces a fixed-length
buffer for DMA descriptors with a dynamically allocated linked-list of
buffers.

As we discovered with the last attempt, new context programs are
sometimes queued from interrupt context, making it unacceptable to call
tasklet_disable() from context_get_descriptors().

This version of the patch uses ohci->lock for all locking needs instead
of tasklet_disable/enable.  There is a new requirement that
context_get_descriptors() be called while holding ohci->lock.  It was
already held for the AT context, so adding the requirement for the iso
context did not seem particularly onerous.  In addition, this has the
side benefit of allowing iso queue to be safely called from concurrent
user-space threads, which previously was not safe.

Signed-off-by: David Moore <dcm@acm.org>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
-

Fixes the following issues:
  - Isochronous reception stopped prematurely if an application used a
    larger buffer.  (Reproduced with coriander.)
  - Isochronous reception stopped after one or a few frames on VT630x
    in OHCI 1.0 mode.  (Fixes reception in coriander, but dvgrab still
    doesn't work with these chips.)

Patch update: struct member alignment, whitespace nits

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-ohci: CycleTooLong interrupt management

The firewire-ohci driver so far lacked the ability to resume cycle
master duty after that condition happened, as added to ohci1394 in Linux
2.6.18 by commit 57fdb58fa5a140bdd52cf4c4ffc30df73676f0a5.  This ports
this patch to fw-ohci.

The "cycle too long" condition has been seen in practice
  - with IIDC cameras if a mode with packets too large for a speed is
    chosen,
  - sporadically when capturing DV on a VIA VT6306 card with ohci1394/
    ieee1394/ raw1394/ dvgrab 2.
    https://bugzilla.redhat.com/show_bug.cgi?id=415841#c7
(This does not fix Fedora bug 415841.)

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: Fix extraction of source node id

Fix extraction of the source node id from the packet header.

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-ohci: Bug fixes for packet-per-buffer support

This patch corrects a number of bugs in the current OHCI 1.0
packet-per-buffer support:

1. Correctly deal with payloads that cross a page boundary.  The
previous version would not split the descriptor at such a boundary,
potentially corrupting unrelated memory.

2. Allow user-space to specify multiple packets per struct
fw_cdev_iso_packet in the same way that dual-buffer allows.  This is
signaled by header_length being a multiple of header_size.  This
multiple determines the number of packets.  The payload size allocated
per packet is determined by dividing the total payload size by the
number of packets.

3. Make sync support work properly for packet-per-buffer.

I have tested this patch with libdc1394 by forcing my OHCI 1.1
controller to use the packet-per-buffer support instead of dual-buffer.

I would greatly appreciate testing by those who have a DV devices and
other types of iso streamers to make sure I didn't cause any
regressions.

Stefan, with this patch, I'm hoping that libdc1394 will work with all
your OHCI 1.0 controllers now.

The one bit of future work that remains for packet-per-buffer support is
the automatic compaction of short payloads that I discussed with
Kristian.

Signed-off-by: David Moore <dcm@acm.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-ohci: Fix for dualbuffer three-or-more buffers

This patch fixes the problem where different OHCI 1.1 controllers behave
differently when a received iso packet straddles three or more buffers
when using the dual-buffer receive mode.  Two changes are made in order
to handle this situation:

1. The packet sync DMA descriptor is given a non-zero header length and
non-zero payload length.  This is because zero-payload descriptors are
not discussed in the OHCI 1.1 specs and their behavior is thus
undefined.  Instead we use a header size just large enough for a single
header and a payload length of 4 bytes for this first descriptor.

2. As we process received packets in the context's tasklet, read the
packet length out of the headers.  Keep track of the running total of
the packet length as "excess_bytes", so we can ignore any descriptors
where no packet starts or ends.  These descriptors may not have had
their first_res_count or second_res_count fields updated by the
controller so we cannot rely on those values.

The main drawback of this patch is that the excess_bytes value might get
"out of sync" with the packet descriptors if something strange happens
to the DMA program.  I'm not if such a thing could ever happen, but I
appreciate any suggestions in making it more robust.

Also, the packet-per-buffer support may need a similar fix to deal with
issue 1, but I haven't done any work on that yet.

Stefan, I'm hoping that with this patch, all your OHCI 1.1 controllers
will work properly with an unmodified version of libdc1394.

Signed-off-by: David Moore <dcm@acm.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-sbp2: remove unused misleading macro

SBP2_MAX_SECTORS is nowhere used in fw-sbp2.
It merely got copied over from sbp2 where it played a role in the past.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-sbp2: prepare for s/g chaining

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

firewire: fw-sbp2: refactor workq and kref handling

This somewhat reduces the size of firewire-sbp2.ko.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: ohci1394: don't schedule IT tasklets on IR events

Bug noted by Pieter Palmers: Isochronous transmit tasklets were
scheduled on isochronous receive events, in addition to the proper
isochronous receive tasklets.

http://marc.info/?l=linux1394-devel&m=119783196222802

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: sbp2: raise default transfer size limit

This patch speeds up sbp2 a little bit --- but more importantly, it
brings the behavior of sbp2 and fw-sbp2 closer to each other.  Like
fw-sbp2, sbp2 now does not limit the size of single transfers to 255
sectors anymore, unless told so by a blacklist flag or by module load
parameters.

Only very old bridge chips have been known to need the 255 sectors
limit, and we have got one such chip in our hardwired blacklist.  There
certainly is a danger that more bridges need that limit; but I prefer to
have this issue present in both fw-sbp2 and sbp2 rather than just one of
them.

An OXUF922 with 400GB 7200RPM disk on an S400 controller is sped up by
this patch from 22.9 to 23.5 MB/s according to hdparm.  The same effect
could be achieved before by setting a higher max_sectors module
parameter.  On buses which use 1394b beta mode, sbp2 and fw-sbp2 will
now achieve virtually the same bandwidth.  Fw-sbp2 only remains faster
on 1394a buses due to fw-core's gap count optimization.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: remove unused code

The code has been in "#if 0 - #endif" since Linux 2.6.12.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: small cleanup after "nopage"

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: nopage

Convert ieee1394 from nopage to fault.
Remove redundant vma range checks (correct resource range check is retained).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: Add missing "space"

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: sbp2: s/g list access cosmetics

Replace sg->length by sg_dma_len(sg). Rename a variable for shorter
line lengths and eliminate some superfluous local variables.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

ieee1394: sbp2: prepare for s/g chaining

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

[SCSI] Revert "[SCSI] aacraid: fib context lock for management ioctls"

This reverts commit a119ee8ee3045bf559d4cf02d72b112f3de2a15b.

Adaptec found this was causing system lockups.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] bsg: copy the cmd_type field to the subordinate request for bidi

This fixes a problem in SCSI where we use the (previously
uninitialised) cmd_type via blk_pc_request() to set up the transfer in
scsi_init_sgtable().

Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] handle scsi_init_queue failure properly

scsi_init_queue is expected to clean up allocated things when it
fails.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] destroy scsi_bidi_sdb_cache in scsi_exit_queue

Needs to call kmem_cache_destroy for scsi_bidi_sdb_cache in
scsi_exit_queue.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_debug: add XDWRITEREAD_10 support

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_debug: add bidi data transfer support

This enables fill_from_dev_buffer and fetch_to_dev_buffer to handle
bidi commands.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] scsi_debug: add get_data_transfer_info helper function

This adds get_data_transfer_info helper function that get lha and
sectors for READ_* and WRITE_* commands (and XDWRITEREAD_10 later).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] remove use_sg_chaining

With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.

Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] bidirectional: fix up for the new blk_end_request code

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] bidirectional command support

At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.

- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
  second sgtable was allocated.

- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
  from this command This API is to isolate users from the mechanics of
  bidi.

- Define scsi_end_bidi_request() to do what scsi_end_request() does but
  for a bidi request. This is necessary because bidi commands are a bit
  tricky here. (See comments in body)

- scsi_release_buffers() will also release the bidi_read scsi_data_buffer

- scsi_io_completion() on bidi commands will now call
  scsi_end_bidi_request() and return.

- The previous work done in scsi_init_io() is now done in a new
  scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
  The new scsi_init_io() will call the above twice if needed also for
  the bidi_read command. Only at this point is a command bidi.

- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
  confused by a get-sense command that looks like bidi. This is done
  by puting NULL at request->next_rq, and restoring.

[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] implement scsi_data_buffer

In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.

- Group all IO members of scsi_cmnd into a scsi_data_buffer
  structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
  scsi_cmnd. And work on it.
- Adjust scsi_init_io() and  scsi_release_buffers() for above
  change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
  accessors where appropriate.

- fix Documentation about scsi_cmnd in scsi_host.h

- scsi_error.c
  * Changed needed members of struct scsi_eh_save.
  * Careful considerations in scsi_eh_prep/restore_cmnd.

- sd.c and sr.c
  * sd and sr would adjust IO size to align on device's block
    size so code needs to change once we move to scsi_data_buff
    implementation.
  * Convert code to use scsi_for_each_sg
  * Use data accessors where appropriate.

- tgt: convert libsrp to use scsi_data_buffer

- isd200: This driver still bangs on scsi_cmnd IO members,
  so need changing

[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] tgt: use scsi_init_io instead of scsi_alloc_sgtable

If we export scsi_init_io()/scsi_release_buffers() instead of
scsi_{alloc,free}_sgtable() from scsi_lib than tgt code is much more
insulated from scsi_lib changes. As a bonus it will also gain bidi
capability when it comes.

[jejb: rebase on to sg_table and fix up rejections]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] aic7xxx: fix warnings with CONFIG_PM disabled

CC [M] drivers/scsi/aic7xxx/aic7xxx_osm_pci.o
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c:148: warning: 'ahc_linux_pci_dev_suspend' defined but not used
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c:166: warning: 'ahc_linux_pci_dev_resume' defined but not used

This moves aic7xxx_pci_driver struct, removes some forward declarations,
and adds some ifdef CONFIG_PM.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] aic79xx: fix warnings with CONFIG_PM disabled

CC [M] drivers/scsi/aic7xxx/aic79xx_osm_pci.o
drivers/scsi/aic7xxx/aic79xx_osm_pci.c:101: warning: 'ahd_linux_pci_dev_suspend' defined but not used
drivers/scsi/aic7xxx/aic79xx_osm_pci.c:121: warning: 'ahd_linux_pci_dev_resume' defined but not used

This moves aic79xx_pci_driver struct, removes some forward
declarations, and adds some ifdef CONFIG_PM.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] aic7xxx: fix ahc_done check SCB_ACTIVE for tagged transactions

The driver only needs to check the SCB_ACTIVE flag if the SCB is not
in the untagged queue.

If the driver is in error recovery, you may end panic'ing on a TUR
that is in the untagged queue.

Attempting to queue an ABORT message
CDB: 0x0 0x0 0x0 0x0 0x0 0x0
SCB 3 done'd twice

This patch is included in Adaptec's 6.3.11 driver on their website.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] sgiwd93: use cached memory access to make driver work on IP28

SGI IP28 machines would need special treatment (enable adding addtional
wait states) when accessing memory uncached. To avoid this pain I
changed the driver to use only cached access to memory.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: fix sense_buffer access bug

The commit de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] ncr53c8xx: fix sense_buffer access bug

The commit de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] aic79xx: fix sense_buffer access bug

The commit de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] hptiop: fix sense_buffer access bug

&cmnd->sense_buffer now zeroes the wrong thing.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] sym53c8xx: fix bad memset argument in sym_set_cam_result_error

On a big powerpc box I got the following oops with 2.6.24-git2:

sym0: <1010-66> rev 0x1 at pci 0000:d0:01.0 irq 215
sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.3
target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
scsi 0:0:8:0: Direct-Access     IBM      ST318305LC       C509 PQ: 0
ANSI: 3
target0:0:8: tagged command queuing enabled, command queue depth 16.
target0:0:8: Beginning Domain Validation
target0:0:8: asynchronous
target0:0:8: wide asynchronous
target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000038460
cpu 0x25: Vector: 300 (Data Access) at [c00000000f567840]
    pc: c000000000038460: .memcpy+0x60/0x280
    lr: d000000000050280: .sym_set_cam_result_error+0xfc/0x1e0 [sym53c8xx]
    sp: c00000000f567ac0
   msr: 8000000000009032
   dar: 0
dsisr: 42000000
  current = 0xc000006d1e0af0a0
  paca    = 0xc0000000004afc00
    pid   = 0, comm = swapper
enter ? for help
[link register   ] d000000000050280
.sym_set_cam_result_error+0xfc/0x1e0 [sym53c8xx]
[c00000000f567ac0] c00000000f567b80 (unreliable)
[c00000000f567b80] d0000000000552b8 .sym_complete_error+0x12c/0x1bc [sym53c8xx]
[c00000000f567c20] d0000000000561a4 .sym_int_sir+0xaa4/0x1718 [sym53c8xx]
[c00000000f567d00] d000000000057e8c .sym_interrupt+0x4e4/0x6ec [sym53c8xx]
[c00000000f567dc0] d00000000004fdf4 .sym53c8xx_intr+0x6c/0xdc [sym53c8xx]
[c00000000f567e50] c0000000000a83e0 .handle_IRQ_event+0x7c/0xec
[c00000000f567ef0] c0000000000aa344 .handle_fasteoi_irq+0x130/0x1f0
[c00000000f567f90] c00000000002a538 .call_handle_irq+0x1c/0x2c
[c000004d5e0b3a90] c00000000000c320 .do_IRQ+0x108/0x1d0
[c000004d5e0b3b20] c000000000004790 hardware_interrupt_entry+0x18/0x1c

The memset() in sym_set_cam_result_error() would appear to be trashing
the scsi_cmnd struct instead of clearing sense_buffer.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

dlm: static initialization improvements

also change name_prefix from char pointer to char array.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: clean ups

A couple small clean-ups. Remove unnecessary wrapper-functions in
rcom.c, and remove unnecessary casting and an unnecessary ASSERT in
util.c.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: Sanity check namelen before copying it

The 32/64 compatibility code in the DLM does not check the validity of
the lock name length passed into it, so it can easily overwrite memory
if the value is rubbish (as early versions of libdlm can cause with
unlock calls, it doesn't zero the field).

This patch restricts the length of the name to the amount of data
actually passed into the call.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: keep cached master rsbs during recovery

To prevent the master of an rsb from changing rapidly, an unused rsb is kept
on the "toss list" for a period of time to be reused.  The toss list was
being cleared completely for each recovery, which is unnecessary.  Much of
the benefit of the toss list can be maintained if nodes keep rsb's in their
toss list that they are the master of.  These rsb's need to be included
when the resource directory is rebuilt during recovery.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: change error message to debug

The invalid lockspace messages are normal and can appear relatively
often. They should be suppressed without debugging enabled.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: fix possible use-after-free

The dlm_put_lkb() can free the lkb and its associated ua structure,
so we can't depend on using the ua struct after the put.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: limit dir lookup loop

In a rare case we may need to repeat a local resource directory lookup
due to a race with removing the rsb and removing the resdir record.
We'll never need to do more than a single additional lookup, though,
so the infinite loop around the lookup can be removed. In addition
to being unnecessary, the infinite loop is dangerous since some other
unknown condition may appear causing the loop to never break.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: reject normal unlock when lock is waiting for lookup

Non-forced unlocks should be rejected if the lock is waiting on the
rsb_lookup list for another lock to establish the master node.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: validate messages before processing

There was some hit and miss validation of messages that has now been
cleaned up and unified.  Before processing a message, the new
validate_message() function checks that the lkb is the appropriate type,
process-copy or master-copy, and that the message is from the correct
nodeid for the the given lkb.  Other checks and assertions on the
lkb type and nodeid have been removed.  The assertions were particularly
bad since they would panic the machine instead of just ignoring the bad
message.

Although other recent patches have made processing old message unlikely,
it still may be possible for an old message to be processed and caught
by these checks.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: reject messages from non-members

Messages from nodes that are no longer members of the lockspace should be
ignored. When nodes are removed from the lockspace, recovery can
sometimes complete quickly enough that messages arrive from a removed node
after recovery has completed. When processed, these messages would often
cause an error message, and could in some cases change some state, causing
problems.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: another call to confirm_master in receive_request_reply

When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of
retried, there may be other lkb's waiting on the rsb_lookup list for it
to complete. A call to confirm_master() is needed to move on to the next
waiting lkb since the current one won't be retried.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: recover locks waiting for overlap replies

When recovery looks at locks waiting for replies, it fails to consider
locks that have already received a reply for their first remote operation,
but not received a reply for secondary, overlapping unlock/cancel. The
appropriate stub reply needs to be called for these waiters.

Appears when we start doing recovery in the presence of a many overlapping
unlock/cancel ops.

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: clear ast_type when removing from astqueue

The lkb_ast_type field indicates whether the lkb is on the astqueue list.
When clearing locks for a process, lkb's were being removed from the astqueue
list without clearing the field. If release_lockspace then happened
immediately afterward, it could try to remove the lkb from the list a second
time.

Appears when process calls libdlm dlm_release_lockspace() which first
closes the ls dev triggering clear_proc_locks, and then removes the ls
(a write to control dev) causing release_lockspace().

Signed-off-by: David Teigland <teigland@redhat.com>

dlm: use fixed errno values in messages

Some errno values differ across platforms. So if we return things like
-EINPROGRESS from one node it can get misinterpreted or rejected on
another one.

This patch fixes up the errno values passed on the wire so that they
match the x86 ones (so as not to break the protocol), and re-instates
the platform-specific ones at the other end.

Many thanks to Fabio for testing this patch.
Initial patch from Patrick.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: swap bytes for rcom lock reply

DLM_RCOM_LOCK_REPLY messages need byte swapping.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>

dlm: align midcomms message buffer

gcc does not guarantee that an auto buffer is 64bit aligned.
This change allows sparc64 to work.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>

KVM: Move apic timer migration away from critical section

Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port. Move migration to the regular
vcpu execution path, triggered by a new bitflag.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Put kvm_para.h include outside __KERNEL__

kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Fix unbounded preemption latency

When preparing to enter the guest, if an interrupt comes in while
preemption is disabled but interrupts are still enabled, we miss a
preemption point. Fix by explicitly checking whether we need to
reschedule.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Initialize the mmu caches only after verifying cpu support

Otherwise we re-initialize the mmu caches, which will fail since the
caches are already registered, which will cause us to deinitialize said caches.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Fix dirty page setting for pages removed from rmap

Right now rmap_remove won't set the page as dirty if the shadow pte
pointed to this page had write access and then it became readonly.
This patches fixes that, by setting the page as dirty for spte changes from
write to readonly access.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Portability: Move kvm_fpu to asm-x86/kvm.h

This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: x86 emulator: Only allow VMCALL/VMMCALL trapped by #UD

When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.

After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.

In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.

Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Merge shadow level check in FNAME(fetch)

Remove the redundant level check when fetching
shadow pte for present & non-present spte.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Move kvm_free_some_pages() into critical section

If some other cpu steals mmu pages between our check and an attempt to
allocate, we can run out of mmu pages. Fix by moving the check into the
same critical section as the allocation.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Switch to mmu spinlock

Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.

Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()

Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).

[marcelo: avoid recursive locking of mmap_sem]

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Add kvm_read_guest_atomic()

In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Concurrent guest walkers

Do not hold kvm->lock mutex across the entire pagefault code,
only acquire it in places where it is necessary, such as mmu
hash list, active list, rmap and parent pte handling.

Allow concurrent guest walkers by switching walk_addr() to use
mmap_sem in read-mode.

And get rid of the lockless __gfn_to_page.

[avi: move kvm_mmu_pte_write() locking inside the function]
[avi: add locking for real mode]
[avi: fix cmpxchg locking]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Disable vapic support on Intel machines with FlexPriority

FlexPriority accelerates the tpr without any patching.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Accelerated apic support

This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: local APIC TPR access reporting facility

Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel. This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Print data for unimplemented wrmsr

This can help diagnosing what the guest is trying to do. In many cases
we can get away with partial emulation of msrs.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Add cache miss statistic

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Coalesce remote tlb flushes

Host side TLB flush can be merged together if multiple
spte need to be write-protected.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Expose ioapic to ia64 save/restore APIs

IA64 also needs to see ioapic structure in irqchip.

Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Move kvm_vcpu_kick() to x86.c

Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Move ioapic code to common directory.

Move ioapic code to common, since IA64 also needs it.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Move irqchip declarations into new ioapic.h and lapic.h

This allows reuse of ioapic in ia64.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Move drivers/kvm/* to virt/kvm/

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Move arch dependent files to new directory arch/x86/kvm/

This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Add printk_ratelimit in vmx_intr_assist

Add printk_ratelimit check in front of printk. This prevents spamming
of the message during 32-bit ubuntu 6.06server install. Previously, it
would hang during the partition formatting stage.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Portability: Move kvm_vm_stat to x86.h

This patch moves kvm_vm_stat to x86.h, and every arch
can define its own kvm_vm_stat in $arch.h

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>