Mingming Cao [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup
While "every 5 seconds" doesn't sound as a problem, there can be many
of these (and these timers do add up over all the kernel). The "5
second" wakeup isn't really timing sensitive; in addition even with
rounding it'll still happen every 5 seconds (with the exception of the
very first time, which is likely to be rounded up to somewhere closer
to 6 seconds)
(Ported from similar JBD patch made by Arjan van de Ven to
fs/jbd/transaction.c)
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mingming Cao [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
jbd2: Mark jbd2 slabs as SLAB_TEMPORARY
This patch marks slab allocations by jbd2 as short-lived in support of
Mel Gorman's "Group short-lived and reclaimable kernel allocations"
patch. (Ported from similar changes made to fs/jbd/journal.c and
fs/jbd/revoke.c in Mel's patch.)
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Mingming Cao [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
jbd2: add lockdep support
Ported from similar patch for the jbd layer.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Use the ext4_ext_actual_len() helper function
ext4 uses the high bit of the extent length to encode whether the extent
is intialized or not. The helper function ext4_ext_get_actual_len should
be used to get the actual length of the extent.
This addresses the kernel bug documented here:
http://bugzilla.kernel.org/show_bug.cgi?id=9732
kernel BUG at fs/ext4/extents.c:1056!
....
Call Trace:
[<
ffffffff88366073>] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1
[<
ffffffff81053c91>] lock_release_holdtime+0x27/0x49
[<
ffffffff812748f6>] _spin_unlock+0x17/0x20
[<
ffffffff883400a6>] :jbd2:start_this_handle+0x4e0/0x4fe
[<
ffffffff88366564>] :ext4dev:ext4_fallocate+0x175/0x39a
[<
ffffffff81053c91>] lock_release_holdtime+0x27/0x49
[<
ffffffff81056480>] __lock_acquire+0x4e7/0xc4d
[<
ffffffff81053c91>] lock_release_holdtime+0x27/0x49
[<
ffffffff810a8de7>] sys_fallocate+0xe4/0x10d
[<
ffffffff8100c043>] tracesys+0xd5/0xda
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Dmitry Monakhov [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: fix uniniatilized extent splitting error
Fix bug reported by Dmitry Monakhov caused by lost error code
Testcase:
blksize = 0x1000;
fd = open(argv[1], O_RDWR|O_CREAT, 0700);
unsigned long long sz = 0x10000000UL;
/* allocating big blocks chunk */
syscall(__NR_fallocate, fd, 0, 0UL, sz)
/* grab all other available filesystem space */
tfd = open("tmp", O_RDWR|O_CREAT|O_DIRECT, 0700);
while( write(tfd, buf, 4096) > 0); /* loop untill ENOSPC */
fsync(fd); /* just in case */
while (pos < sz) {
/* each seek+ write operation result in splits uninitialized extent
in three extents. Splitting may result in new extent allocation
which probably will fail because of ENOSPC*/
lseek(fd, blksize*2 -1, SEEK_CUR);
if ((ret = write(fd, 'a', 1)) != 1)
exit(1);
pos += blksize * 2;
}
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Check for return value from sb_set_blocksize
sb_set_blocksize validates whether the specfied block size can be used by
the file system. Make sure we fail mounting the file system if the
blocksize specfied cannot be used.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Miklos Szeredi [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Add stripe= option to /proc/mounts
Add stripe= option to /proc/mounts for ext4 filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
ext4: Enable the multiblock allocator by default
Enable the multiblock allocator by default.
Fix ext4_show_options() so if it is not enabled, the nomballoc option
included in /proc/mounts.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Alex Tomas [Tue, 29 Jan 2008 05:19:52 +0000 (00:19 -0500)]
ext4: Add multi block allocator for ext4
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Alex Tomas [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Add new functions for searching extent tree
Add the functions ext4_ext_search_left() and ext4_ext_search_right(),
which are used by mballoc during ext4_ext_get_blocks to decided whether
to merge extent information.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Add ext4_find_next_bit()
This function is used by the ext4 multi block allocator patches.
Also add generic_find_next_le_bit
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eric Sandeen [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: fix up EXT4FS_DEBUG builds
Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail
without these changes. Clean up some format warnings too.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Fix ext4_show_options to show the correct mount options.
We need to look at the default value and make sure
the mount options are not set via default value
before showing them via ext4_show_options
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
ext4: Add EXT4_IOC_MIGRATE ioctl
The below patch add ioctl for migrating ext3 indirect block mapped inode
to ext4 extent mapped inode.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Jean Noel Cordenner [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Add inode version support in ext4
This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.
This field is incremented in case the ext4_inode is large enough. A
i_version mount option has been added to enable the feature.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
Jean Noel Cordenner [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
vfs: Add 64 bit i_version support
The i_version field of the inode is changed to be a 64-bit counter that
is set on every inode creation and that is incremented every time the
inode data is modified (similarly to the "ctime" time-stamp).
The aim is to fulfill a NFSv4 requirement for rfc3530.
This first part concerns the vfs, it converts the 32-bit i_version in
the generic inode to a 64-bit, a flag is added in the super block in
order to check if the feature is enabled and the i_version is
incremented in the vfs.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Girish Shilamkar [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Add the journal checksum feature
The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.
JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.
For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Girish Shilamkar <girish@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Johann Lombardi [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
jbd2: jbd2 stats through procfs
The patch below updates the jbd stats patch to 2.6.20/jbd2.
The initial patch was posted by Alex Tomas in December 2005
(http://marc.info/?l=linux-ext4&m=
113538565128617&w=2).
It provides statistics via procfs such as transaction lifetime and size.
Sometimes, investigating performance problems, i find useful to have
stats from jbd about transaction's lifetime, size, etc. here is a
patch for review and inclusion probably.
for example, stats after creation of 3M files in htree directory:
[root@bob ~]# cat /proc/fs/jbd/sda/history
R/C tid wait run lock flush log hndls block inlog ctime write drop close
R 261 8260 2720 0 0 750 9892 8170 8187
C 259 750 0 4885 1
R 262 20 2200 10 0 770 9836 8170 8187
R 263 30 2200 10 0 3070 9812 8170 8187
R 264 0 5000 10 0 1340 0 0 0
C 261 8240 3212 4957 0
R 265 8260 1470 0 0 4640 9854 8170 8187
R 266 0 5000 10 0 1460 0 0 0
C 262 8210 2989 4868 0
R 267 8230 1490 10 0 4440 9875 8171 8188
R 268 0 5000 10 0 1260 0 0 0
C 263 7710 2937 4908 0
R 269 7730 1470 10 0 3330 9841 8170 8187
R 270 0 5000 10 0 830 0 0 0
C 265 8140 3234 4898 0
C 267 720 0 4849 1
R 271 8630 2740 20 0 740 9819 8170 8187
C 269 800 0 4214 1
R 272 40 2170 10 0 830 9716 8170 8187
R 273 40 2280 0 0 3530 9799 8170 8187
R 274 0 5000 10 0 990 0 0 0
where,
R - line for transaction's life from T_RUNNING to T_FINISHED
C - line for transaction's checkpointing
tid - transaction's id
wait - for how long we were waiting for new transaction to start
(the longest period journal_start() took in this transaction)
run - real transaction's lifetime (from T_RUNNING to T_LOCKED
lock - how long we were waiting for all handles to close
(time the transaction was in T_LOCKED)
flush - how long it took to flush all data (data=ordered)
log - how long it took to write the transaction to the log
hndls - how many handles got to the transaction
block - how many blocks got to the transaction
inlog - how many blocks are written to the log (block + descriptors)
ctime - how long it took to checkpoint the transaction
write - how many blocks have been written during checkpointing
drop - how many blocks have been dropped during checkpointing
close - how many running transactions have been closed to checkpoint this one
all times are in msec.
[root@bob ~]# cat /proc/fs/jbd/sda/info
280 transaction, each upto 8192 blocks
average:
1633ms waiting for transaction
3616ms running transaction
5ms transaction was being locked
1ms flushing data (in ordered mode)
1799ms logging transaction
11781 handles per transaction
5629 blocks per transaction
5641 logged blocks per transaction
Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:29 +0000 (23:58 -0500)]
ext4: Take read lock during overwrite case.
When we are overwriting a file and not actually allocating new file system
blocks we need to take only the read lock on i_data_sem.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
ext4: Convert truncate_mutex to read write semaphore.
We are currently taking the truncate_mutex for every read. This would have
performance impact on large CPU configuration. Convert the lock to read write
semaphore and take read lock when we are trying to read the file.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Make ext4_get_blocks_wrap take the truncate_mutex early.
When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside lock. To make this
happen move truncate_mutex early before checking the i_flags.
This actually should enable us to remove the verify_chain().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Mariusz Kozlowski [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: remove unused code from ext4_find_entry()
The unused code found in ext3_find_entry() is also present (and still
unused) in the ext4_find_entry() code. This patch removes it.
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Check for the correct error return from
ext4_ext_get_blocks returns negative values on error. We should
check for <= 0
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Jan Kara [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
jbd2: Fix assertion failure in fs/jbd2/checkpoint.c
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.
If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).
We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state. The locking there is subtle though (as
everywhere in JBD ;(). We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.
Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either. Better be safe if something
changes in future...
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chris Snook [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
jbd2: Remove printk from J_ASSERT to preserve registers during BUG
Signed-off-by: Chris Snook <csnook@redhat.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: add block bitmap validation
When a new block bitmap is read from disk in read_block_bitmap()
there are a few bits that should ALWAYS be set. In particular,
the blocks given corresponding to block bitmap, inode bitmap and inode tables.
Validate the block bitmap against these blocks.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
Add buffer head related helper functions
Add buffer head related helper function bh_uptodate_or_lock and
bh_submit_read which can be used by file system
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
ext4: Change the default behaviour on error
ext4 file system was by default ignoring errors and continuing. This
is not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system
readonly. Debian and ubuntu already does this as the default in their
fstab.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Eric Sandeen [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: fix oops on corrupted ext4 mount
When mounting an ext4 filesystem with corrupted s_first_data_block, things
can go very wrong and oops.
Because blocks_count in ext4_fill_super is a u64, and we must use do_div,
the calculation of db_count is done differently than on ext4. If
first_data_block is corrupted such that it is larger than ext4_blocks_count,
for example, then the intermediate blocks_count value may go negative,
but sign-extend to a very large value:
blocks_count = (ext4_blocks_count(es) -
le32_to_cpu(es->s_first_data_block) +
EXT4_BLOCKS_PER_GROUP(sb) - 1);
This is then assigned to s_groups_count which is an unsigned long:
sbi->s_groups_count = blocks_count;
This may result in a value of 0xFFFFFFFF which is then used to compute
db_count:
db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
EXT4_DESC_PER_BLOCK(sb);
and in this case db_count will wind up as 0 because the addition overflows
32 bits. This in turn causes the kmalloc for group_desc to be of 0 size:
sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *),
GFP_KERNEL);
and eventually in ext4_check_descriptors, dereferencing
sbi->s_group_desc[desc_block] will result in a NULL pointer dereference.
The simplest test seems to be to sanity check s_first_data_block,
EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure
their combination won't result in a bad intermediate value for
blocks_count. We could just check for db_count == 0, but
catching it at the root cause seems like it provides more info.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mingming Cao <cmm@us.ibm.com>
Adrian Bunk [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*)
Based on a report by Robert P. J. Day.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Return after ext4_error in case of failures
This fix some instances where we were continuing after calling
ext4_error. ext4_error call panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext4_error call
Reported by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Coly Li [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: sync up block group descriptor with e2fsprogs.
This patch extends bg_itable_unused of ext4 group descriptor
from 16bit into 32bit. In order to add bg_itable_unused_hi into
struct ext4_group_desc, some extra fields which are already introduced into
e2fsprogs are also added in for consistency.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
ext3: Fix the max file size for ext3 file system.
The max file size for ext3 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
ext2: Fix the max file size for ext2 file system.
The max file size for ext2 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Eric Sandeen [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: store maxbytes for bitmapped files and return EFBIG as appropriate
Calculate & store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Eric Sandeen [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: export iov_shorten from kernel for ext4's use
Export iov_shorten() from kernel so that ext4 can
truncate too-large writes to bitmapped files.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Eric Sandeen [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: different maxbytes functions for bitmap & extent files
use 2 different maxbytes functions for bitmapped & extent-based
files.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Support large files
This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.
The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.
inode flag EXT4_HUGE_FILE_FL
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:26 +0000 (23:58 -0500)]
ext4: Add support for 48 bit inode i_blocks.
Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.
We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.
Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Rename i_dir_acl to i_size_high
Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Rename i_file_acl to i_file_acl_lo
Rename i_file_acl to i_file_acl_lo. This helps
in finding bugs where we use i_file_acl instead
of the combined i_file_acl_lo and i_file_acl_high
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Fix sparse warnings.
Fix sparse warnings related to static functions
and local variables.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Introduce ext4_update_*_feature
Introduce ext4_update_*_feature and use them instead
of opencoding.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Avantika Mathur [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: fixes block group number being set to a negative value
This patch fixes various places where the group number is set to a negative
value.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Avantika Mathur [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: add ext4_group_t, and change all group variables to this type.
In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB
This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
Eric Sandeen [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4 extents: remove unneeded casts
There are many casts in extents.c which are not needed,
as the variables are already the type of the cast, or
are being promoted for no particular reason in printk's.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Aneesh Kumar K.V [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Introduce ext4_lblk_t
This patch adds a new data type ext4_lblk_t to represent
the logical file blocks.
This is the preparatory patch to support large files in ext4
The follow up patch with convert the ext4_inode i_blocks to
represent the number of blocks in file system block size. This
changes makes it possible to have a block number 2**32 -1 which
will result in overflow if the block number is represented by
signed long. This patch convert all the block number to type
ext4_lblk_t which is typedef to __u32
Also remove dead code ext4_ext_walk_space
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Jan Kara [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Avoid rec_len overflow with 64KB block size
With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0xffff instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Takashi Sato [Tue, 29 Jan 2008 04:58:27 +0000 (23:58 -0500)]
ext4: Support large blocksize up to PAGESIZE
This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code. The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem. The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0xffff to handle this.
The Patch-set consists of the following 2 patches.
[1/2] ext4: enlarge blocksize
- Allow blocksize up to pagesize
[2/2] ext4: fix rec_len overflow
- prevent rec_len from overflow with 64KB blocksize
Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Joel Becker [Tue, 29 Jan 2008 02:52:04 +0000 (18:52 -0800)]
ocfs2: Fix userspace ABI breakage in sysfs
The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by
commit
c60b71787982cefcf9fa09aa281fa8c4c685d557 "kset: convert ocfs2 to
use kset_create". Specifically, the '/sys/o2cb' kset was moved to
'/sys/fs/o2cb'. This breaks all ocfs2 tools and renders the
filesystem unmountable.
This fix moves '/sys/o2cb' back where it belongs.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
YOSHIFUJI Hideaki [Mon, 28 Jan 2008 23:46:02 +0000 (15:46 -0800)]
[IPV6] ADDRLABEL: Fix double free on label deletion.
If an entry is being deleted because it has only one reference,
we immediately delete it and blindly register the rcu handler for it,
This results in oops by double freeing that object.
This patch fixes it by consolidating the code paths for the deletion;
let its rcu handler delete the object if it has no more reference.
Bug was found by Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Thu, 24 Jan 2008 04:54:07 +0000 (20:54 -0800)]
[PPP]: Sparse warning fixes.
Fix a bunch of warnings in PPP and related drivers. Mostly because
sparse doesn't like it when the the function is only marked private in
the forward declaration.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Thu, 24 Jan 2008 04:38:24 +0000 (20:38 -0800)]
[IPV4] fib_trie: remove unneeded NULL check
Since fib_route_seq_show now uses hlist_for_each_entry(), the leaf
info can not be NULL.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Thu, 24 Jan 2008 04:37:50 +0000 (20:37 -0800)]
[IPV4] fib_trie: More whitespace cleanup.
Remove extra blank lines.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:36:45 +0000 (20:36 -0800)]
[NET_SCHED]: Use nla_policy for attribute validation in ematches
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:36:30 +0000 (20:36 -0800)]
[NET_SCHED]: Use nla_policy for attribute validation in actions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:36:12 +0000 (20:36 -0800)]
[NET_SCHED]: Use nla_policy for attribute validation in classifiers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:35:39 +0000 (20:35 -0800)]
[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:35:19 +0000 (20:35 -0800)]
[NET_SCHED]: sch_api: introduce constant for rate table size
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:35:03 +0000 (20:35 -0800)]
[NET_SCHED]: Use typeful attribute parsing helpers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:34:48 +0000 (20:34 -0800)]
[NET_SCHED]: Use typeful attribute construction helpers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:34:28 +0000 (20:34 -0800)]
[NET_SCHED]: Use NLA_PUT_STRING for string dumping
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:34:11 +0000 (20:34 -0800)]
[NET_SCHED]: Use nla_nest_start/nla_nest_end
Use nla_nest_start/nla_nest_end for dumping nested attributes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:33:32 +0000 (20:33 -0800)]
[NET_SCHED]: Propagate nla_parse return value
nla_parse() returns more detailed errno codes, propagate them back on
error.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:33:13 +0000 (20:33 -0800)]
[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:32:58 +0000 (20:32 -0800)]
[NET_SCHED]: act_api: use nlmsg_parse
Convert open-coded nlmsg_parse to use the real function.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:32:42 +0000 (20:32 -0800)]
[NET_SCHED]: act_api: fix netlink API conversion bug
Fix two invalid attribute accesses, indices start at 1 with the new
netlink API.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:32:21 +0000 (20:32 -0800)]
[NET_SCHED]: sch_netem: use nla_parse_nested_compat
Replace open coded equivalent of nla_parse_nested_compat().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 24 Jan 2008 04:32:06 +0000 (20:32 -0800)]
[NET_SCHED]: sch_atm: fix format string warning
Fix format string warning introduces by the netlink API conversion:
net/sched/sch_atm.c:250: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 07:50:57 +0000 (23:50 -0800)]
[NETNS]: Add namespace for ICMP replying code.
All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.
Other protocols will follow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 07:50:25 +0000 (23:50 -0800)]
[NETNS]: Routing cache virtualization.
Basically, this piece looks relatively easy. Namespace is already
available on the dst entry via device and the device is safe to
dereferrence. Compare it with one of a searcher and skip entry if
appropriate.
The only exception is ip_rt_frag_needed. So, add namespace parameter to it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 07:49:35 +0000 (23:49 -0800)]
[NETNS]: Correct namespace for connect-time routing.
ip_route_connect and ip_route_newports are a part of routing API
presented to the socket layer. The namespace is available inside them
through a socket.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 23 Jan 2008 06:11:50 +0000 (22:11 -0800)]
[NET_SCHED]: Convert actions from rtnetlink to new netlink API
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 23 Jan 2008 06:11:33 +0000 (22:11 -0800)]
[NET_SCHED]: Convert classifiers from rtnetlink to new netlink API
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 23 Jan 2008 06:11:17 +0000 (22:11 -0800)]
[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API
Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 23 Jan 2008 06:10:59 +0000 (22:10 -0800)]
[NETLINK]: Add nla_append()
Used to append data to a message without a header or padding.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 23 Jan 2008 06:10:42 +0000 (22:10 -0800)]
[NET_SCHED]: mark classifier ops __read_mostly
Additionally remove unnecessary NULL initilizations of the next pointer.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 23 Jan 2008 06:10:23 +0000 (22:10 -0800)]
[NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:07:34 +0000 (22:07 -0800)]
[NETNS]: Add namespace parameter to ip_route_output_key.
Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:07:10 +0000 (22:07 -0800)]
[NETNS]: Add namespace parameter to ip_route_output_flow.
Needed to propagate it down to the __ip_route_output_key.
Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:06:48 +0000 (22:06 -0800)]
[NETNS]: Add namespace parameter to __ip_route_output_key.
This is only required to propagate it down to the
ip_route_output_slow.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:06:19 +0000 (22:06 -0800)]
[NETNS]: Add namespace parameter to ip_route_output_slow.
This function needs a net namespace to lookup devices, fib tables,
etc. in, so pass it there.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:04:30 +0000 (22:04 -0800)]
[NETNS]: Add namespace parameter to ip_dev_find.
in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:04:04 +0000 (22:04 -0800)]
[NETNS]: Add netns parameter to fib_select_default.
Currently fib_select_default calls fib_get_table() with the
init_net. Prepare it to provide a correct namespace to lookup default
route.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:03:33 +0000 (22:03 -0800)]
[IPV4]: Consolidate fib_select_default.
The difference in the implementation of the fib_select_default when
CONFIG_IP_MULTIPLE_TABLES is (not) defined looks
negligible. Consolidate it and place into fib_frontend.c.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denis V. Lunev [Wed, 23 Jan 2008 06:03:03 +0000 (22:03 -0800)]
[IPV4]: Declarations cleanup in ip_fib.h.
Two small issues fixed:
- fib_select_multipath is exported from fib_semantics.c rather than from
fib_frontend.c. So, move the declaration below appropriate comment.
- struct rt_entry declaration is not used. Drop it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:57:22 +0000 (21:57 -0800)]
[IPV4] fib_trie: avoid rescan on dump
This converts dumping (and flushing) of large route tables form O(N^2)
to O(N). If the route dump took multiple pages then the dump routine
gets called again. The old code kept track of location by counter, the
new code instead uses the last key.
This is a really big win ( 0.3 sec vs 12 sec) for big route tables.
One side effect is that if the table changes during the dump, then the
last key will not be found, and we will return -EBUSY.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:56:34 +0000 (21:56 -0800)]
[IPV4] fib_trie: avoid extra search on delete
Get rid of extra search that made route deletion O(n).
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:56:11 +0000 (21:56 -0800)]
[IPV4] fib_trie: dump table in sorted order
It is easier with TRIE to dump the data traversal rather than
interating over every possible prefix. This saves some time and makes
the dump come out in sorted order.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:55:32 +0000 (21:55 -0800)]
[IPV4] fib_trie: iterator recode
Remove the complex loop structure of nextleaf() and replace it with a
simpler tree walker. This improves the performance and is much
cleaner.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:55:01 +0000 (21:55 -0800)]
[IPV4] fib_trie: dump message multiple part flag
Match fib_hash, and set NLM_F_MULTI to handle multiple part messages.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:54:37 +0000 (21:54 -0800)]
[IPV4] fib_trie: use hash list
The code to dump can use the existing hash chain rather than doing
repeated lookup.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:54:05 +0000 (21:54 -0800)]
[IPV4] fib_trie: compute size when needed
Compute the number of prefixes when needed, rather than doing bookeeping.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:53:36 +0000 (21:53 -0800)]
[IPV4] fib_trie: style cleanup
Style cleanups:
* make check_leaf return -1 or plen, rather than by reference
* Get rid of #ifdef that is always set
* split out embedded function calls in if statements.
* checkpatch warnings
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 23 Jan 2008 05:51:50 +0000 (21:51 -0800)]
[IPV4] fib_trie: put leaf nodes in a slab cache
This improves locality for operations that touch all the leaves. Save
space since these entries don't need to be hardware cache aligned.
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Engelhardt [Wed, 23 Jan 2008 02:30:06 +0000 (18:30 -0800)]
[AF_X25]: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ross Burton [Wed, 23 Jan 2008 02:27:53 +0000 (18:27 -0800)]
[IrDA]: LMP discovery timer not started by default
By default, LMP sets up a 3 seconds timer for discovery.
We don't need it until discovery is set to 1.
Signed-off-by: Ross Burton <ross@openedhand.com>
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruno Randolf [Mon, 21 Jan 2008 02:09:46 +0000 (11:09 +0900)]
ath5k: always extend rx timestamp with tsf
always extend the rx timestamp with the local TSF, since this information is
also needed for proper IBSS merging. this is done in the tasklet for now, maybe
has to be moved to the interrupt handler like in madwifi.
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bruno Randolf [Sat, 19 Jan 2008 09:18:41 +0000 (18:18 +0900)]
ath5k: configure backoff for IBSS beacon queue
in "11.1.2.2 Beacon generation in an IBSS" the IEEE802.11 standard says, each
STA should... "b) Calculate a random delay uniformly distributed in the range
between zero and twice aCWmin × aSlotTime,".
configure cwmin and cwmax of the beacon queue in IBSS mode according to this.
unfortunately beacon backoff does not work reliably yet, so i suspect we have a
problem somewhere else, since the same settings (and similar beacon timer
configuration) work for madwifi.
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bruno Randolf [Sat, 19 Jan 2008 09:18:21 +0000 (18:18 +0900)]
ath5k: use SWBA to detect IBSS HW merges
use SWBA (software beacon alert) interrupts to keep track of the next beacon
time und check if a HW merge (automatic TSF update) has happened on every
received beacon with the same BSSID.
this is necessary because the atheros hardware will silently update the local
TSF in IBSS mode, but not its beacon timers. if the TSF is ahead of the beacon
timers no beacons are sent until the timers wrap around (typically after about
1 minute).
this solution is not very nice, since we have to look into every beacon, but
there is apparently no other way to detect HW merges.
drivers/net/wireless/ath5k/base.c: Changes-licensed-under: 3-Clause-BSD
drivers/net/wireless/ath5k/base.h: Changes-licensed-under: 3-Clause-BSD
Signed-off-by: Bruno Randolf <bruno@thinktube.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>