From d1659fcc59b21ec442564fedb67a5ad371f82380 Mon Sep 17 00:00:00 2001 From: Adrian Bunk Date: Tue, 20 May 2008 12:17:39 -0400 Subject: Input: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk Signed-off-by: Dmitry Torokhov --- Documentation/input/gameport-programming.txt | 2 -- Documentation/input/input.txt | 1 - Documentation/input/joystick-api.txt | 2 -- Documentation/input/joystick-parport.txt | 1 - Documentation/input/joystick.txt | 1 - 5 files changed, 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/input/gameport-programming.txt b/Documentation/input/gameport-programming.txt index 14e0a8b70225..03a74fc3b496 100644 --- a/Documentation/input/gameport-programming.txt +++ b/Documentation/input/gameport-programming.txt @@ -1,5 +1,3 @@ -$Id: gameport-programming.txt,v 1.3 2001/04/24 13:51:37 vojtech Exp $ - Programming gameport drivers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/input/input.txt b/Documentation/input/input.txt index ff8cea0225f9..686ee9932dff 100644 --- a/Documentation/input/input.txt +++ b/Documentation/input/input.txt @@ -1,7 +1,6 @@ Linux Input drivers v1.0 (c) 1999-2001 Vojtech Pavlik Sponsored by SuSE - $Id: input.txt,v 1.8 2002/05/29 03:15:01 bradleym Exp $ ---------------------------------------------------------------------------- 0. Disclaimer diff --git a/Documentation/input/joystick-api.txt b/Documentation/input/joystick-api.txt index acbd32b88454..c507330740cd 100644 --- a/Documentation/input/joystick-api.txt +++ b/Documentation/input/joystick-api.txt @@ -5,8 +5,6 @@ 7 Aug 1998 - $Id: joystick-api.txt,v 1.2 2001/05/08 21:21:23 vojtech Exp $ - 1. Initialization ~~~~~~~~~~~~~~~~~ diff --git a/Documentation/input/joystick-parport.txt b/Documentation/input/joystick-parport.txt index ede5f33daad3..1c856f32ff2c 100644 --- a/Documentation/input/joystick-parport.txt +++ b/Documentation/input/joystick-parport.txt @@ -2,7 +2,6 @@ (c) 1998-2000 Vojtech Pavlik (c) 1998 Andree Borrmann Sponsored by SuSE - $Id: joystick-parport.txt,v 1.6 2001/09/25 09:31:32 vojtech Exp $ ---------------------------------------------------------------------------- 0. Disclaimer diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt index 389de9bd9878..154d767b2acb 100644 --- a/Documentation/input/joystick.txt +++ b/Documentation/input/joystick.txt @@ -1,7 +1,6 @@ Linux Joystick driver v2.0.0 (c) 1996-2000 Vojtech Pavlik Sponsored by SuSE - $Id: joystick.txt,v 1.12 2002/03/03 12:13:07 jdeneux Exp $ ---------------------------------------------------------------------------- 0. Disclaimer -- cgit v1.2.3-58-ga151 From 3915c1e8634a321d9680e5cd80a53053b642dc0c Mon Sep 17 00:00:00 2001 From: Jay Vosburgh Date: Sat, 17 May 2008 21:10:14 -0700 Subject: bonding: Add "follow" option to fail_over_mac Add a "follow" selection for fail_over_mac. This option causes the MAC address to move from slave to slave as the active slave changes. This is in addition to the existing fail_over_mac option that causes the bond's MAC address to change during failover. This new option is useful for devices that cannot tolerate multiple ports using the same MAC address simultaneously, either because it confuses them or incurs a performance penalty (as is the case with some LPAR-aware multiport devices). Because the MAC of the bond itself does not change, the "follow" option is slightly more reliable during failover and doesn't change the MAC of the bond during operation. This patch requires a previous ARP monitor change to properly handle RTNL during failovers. Signed-off-by: Jay Vosburgh Signed-off-by: Jeff Garzik --- Documentation/networking/bonding.txt | 96 ++++++++++++------ drivers/net/bonding/bond_main.c | 185 +++++++++++++++++++++++++++-------- drivers/net/bonding/bond_sysfs.c | 36 +++---- drivers/net/bonding/bonding.h | 4 + 4 files changed, 232 insertions(+), 89 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index a0cda062bc33..8e6b8d3c7410 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -289,35 +289,73 @@ downdelay fail_over_mac Specifies whether active-backup mode should set all slaves to - the same MAC address (the traditional behavior), or, when - enabled, change the bond's MAC address when changing the - active interface (i.e., fail over the MAC address itself). - - Fail over MAC is useful for devices that cannot ever alter - their MAC address, or for devices that refuse incoming - broadcasts with their own source MAC (which interferes with - the ARP monitor). - - The down side of fail over MAC is that every device on the - network must be updated via gratuitous ARP, vs. just updating - a switch or set of switches (which often takes place for any - traffic, not just ARP traffic, if the switch snoops incoming - traffic to update its tables) for the traditional method. If - the gratuitous ARP is lost, communication may be disrupted. - - When fail over MAC is used in conjuction with the mii monitor, - devices which assert link up prior to being able to actually - transmit and receive are particularly susecptible to loss of - the gratuitous ARP, and an appropriate updelay setting may be - required. - - A value of 0 disables fail over MAC, and is the default. A - value of 1 enables fail over MAC. This option is enabled - automatically if the first slave added cannot change its MAC - address. This option may be modified via sysfs only when no - slaves are present in the bond. - - This option was added in bonding version 3.2.0. + the same MAC address at enslavement (the traditional + behavior), or, when enabled, perform special handling of the + bond's MAC address in accordance with the selected policy. + + Possible values are: + + none or 0 + + This setting disables fail_over_mac, and causes + bonding to set all slaves of an active-backup bond to + the same MAC address at enslavement time. This is the + default. + + active or 1 + + The "active" fail_over_mac policy indicates that the + MAC address of the bond should always be the MAC + address of the currently active slave. The MAC + address of the slaves is not changed; instead, the MAC + address of the bond changes during a failover. + + This policy is useful for devices that cannot ever + alter their MAC address, or for devices that refuse + incoming broadcasts with their own source MAC (which + interferes with the ARP monitor). + + The down side of this policy is that every device on + the network must be updated via gratuitous ARP, + vs. just updating a switch or set of switches (which + often takes place for any traffic, not just ARP + traffic, if the switch snoops incoming traffic to + update its tables) for the traditional method. If the + gratuitous ARP is lost, communication may be + disrupted. + + When this policy is used in conjuction with the mii + monitor, devices which assert link up prior to being + able to actually transmit and receive are particularly + susecptible to loss of the gratuitous ARP, and an + appropriate updelay setting may be required. + + follow or 2 + + The "follow" fail_over_mac policy causes the MAC + address of the bond to be selected normally (normally + the MAC address of the first slave added to the bond). + However, the second and subsequent slaves are not set + to this MAC address while they are in a backup role; a + slave is programmed with the bond's MAC address at + failover time (and the formerly active slave receives + the newly active slave's MAC address). + + This policy is useful for multiport devices that + either become confused or incur a performance penalty + when multiple ports are programmed with the same MAC + address. + + + The default policy is none, unless the first slave cannot + change its MAC address, in which case the active policy is + selected by default. + + This option may be modified via sysfs only when no slaves are + present in the bond. + + This option was added in bonding version 3.2.0. The "follow" + policy was added in bonding version 3.3.0. lacp_rate diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 51e0f2de42c6..5b4af3cc2a44 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -100,7 +100,7 @@ static char *xmit_hash_policy = NULL; static int arp_interval = BOND_LINK_ARP_INTERV; static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, }; static char *arp_validate = NULL; -static int fail_over_mac = 0; +static char *fail_over_mac = NULL; struct bond_params bonding_defaults; module_param(max_bonds, int, 0); @@ -136,8 +136,8 @@ module_param_array(arp_ip_target, charp, NULL, 0); MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form"); module_param(arp_validate, charp, 0); MODULE_PARM_DESC(arp_validate, "validate src/dst of ARP probes: none (default), active, backup or all"); -module_param(fail_over_mac, int, 0); -MODULE_PARM_DESC(fail_over_mac, "For active-backup, do not set all slaves to the same MAC. 0 of off (default), 1 for on."); +module_param(fail_over_mac, charp, 0); +MODULE_PARM_DESC(fail_over_mac, "For active-backup, do not set all slaves to the same MAC. none (default), active or follow"); /*----------------------------- Global variables ----------------------------*/ @@ -190,6 +190,13 @@ struct bond_parm_tbl arp_validate_tbl[] = { { NULL, -1}, }; +struct bond_parm_tbl fail_over_mac_tbl[] = { +{ "none", BOND_FOM_NONE}, +{ "active", BOND_FOM_ACTIVE}, +{ "follow", BOND_FOM_FOLLOW}, +{ NULL, -1}, +}; + /*-------------------------- Forward declarations ---------------------------*/ static void bond_send_gratuitous_arp(struct bonding *bond); @@ -973,6 +980,82 @@ static void bond_mc_swap(struct bonding *bond, struct slave *new_active, struct } } +/* + * bond_do_fail_over_mac + * + * Perform special MAC address swapping for fail_over_mac settings + * + * Called with RTNL, bond->lock for read, curr_slave_lock for write_bh. + */ +static void bond_do_fail_over_mac(struct bonding *bond, + struct slave *new_active, + struct slave *old_active) +{ + u8 tmp_mac[ETH_ALEN]; + struct sockaddr saddr; + int rv; + + switch (bond->params.fail_over_mac) { + case BOND_FOM_ACTIVE: + if (new_active) + memcpy(bond->dev->dev_addr, new_active->dev->dev_addr, + new_active->dev->addr_len); + break; + case BOND_FOM_FOLLOW: + /* + * if new_active && old_active, swap them + * if just old_active, do nothing (going to no active slave) + * if just new_active, set new_active to bond's MAC + */ + if (!new_active) + return; + + write_unlock_bh(&bond->curr_slave_lock); + read_unlock(&bond->lock); + + if (old_active) { + memcpy(tmp_mac, new_active->dev->dev_addr, ETH_ALEN); + memcpy(saddr.sa_data, old_active->dev->dev_addr, + ETH_ALEN); + saddr.sa_family = new_active->dev->type; + } else { + memcpy(saddr.sa_data, bond->dev->dev_addr, ETH_ALEN); + saddr.sa_family = bond->dev->type; + } + + rv = dev_set_mac_address(new_active->dev, &saddr); + if (rv) { + printk(KERN_ERR DRV_NAME + ": %s: Error %d setting MAC of slave %s\n", + bond->dev->name, -rv, new_active->dev->name); + goto out; + } + + if (!old_active) + goto out; + + memcpy(saddr.sa_data, tmp_mac, ETH_ALEN); + saddr.sa_family = old_active->dev->type; + + rv = dev_set_mac_address(old_active->dev, &saddr); + if (rv) + printk(KERN_ERR DRV_NAME + ": %s: Error %d setting MAC of slave %s\n", + bond->dev->name, -rv, new_active->dev->name); +out: + read_lock(&bond->lock); + write_lock_bh(&bond->curr_slave_lock); + break; + default: + printk(KERN_ERR DRV_NAME + ": %s: bond_do_fail_over_mac impossible: bad policy %d\n", + bond->dev->name, bond->params.fail_over_mac); + break; + } + +} + + /** * find_best_interface - select the best available slave to be the active one * @bond: our bonding struct @@ -1040,7 +1123,8 @@ static struct slave *bond_find_best_slave(struct bonding *bond) * because it is apparently the best available slave we have, even though its * updelay hasn't timed out yet. * - * Warning: Caller must hold curr_slave_lock for writing. + * If new_active is not NULL, caller must hold bond->lock for read and + * curr_slave_lock for write_bh. */ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) { @@ -1107,12 +1191,9 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) bond_set_slave_active_flags(new_active); } - /* when bonding does not set the slave MAC address, the bond MAC - * address is the one of the active slave. - */ if (new_active && bond->params.fail_over_mac) - memcpy(bond->dev->dev_addr, new_active->dev->dev_addr, - new_active->dev->addr_len); + bond_do_fail_over_mac(bond, new_active, old_active); + bond->send_grat_arp = bond->params.num_grat_arp; if (bond->curr_active_slave && test_bit(__LINK_STATE_LINKWATCH_PENDING, @@ -1137,7 +1218,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) * - The primary_slave has got its link back. * - A slave has got its link back and there's no old curr_active_slave. * - * Warning: Caller must hold curr_slave_lock for writing. + * Caller must hold bond->lock for read and curr_slave_lock for write_bh. */ void bond_select_active_slave(struct bonding *bond) { @@ -1384,14 +1465,14 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) printk(KERN_WARNING DRV_NAME ": %s: Warning: The first slave device " "specified does not support setting the MAC " - "address. Enabling the fail_over_mac option.", + "address. Setting fail_over_mac to active.", bond_dev->name); - bond->params.fail_over_mac = 1; - } else if (!bond->params.fail_over_mac) { + bond->params.fail_over_mac = BOND_FOM_ACTIVE; + } else if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) { printk(KERN_ERR DRV_NAME ": %s: Error: The slave device specified " "does not support setting the MAC address, " - "but fail_over_mac is not enabled.\n" + "but fail_over_mac is not set to active.\n" , bond_dev->name); res = -EOPNOTSUPP; goto err_undo_flags; @@ -1498,6 +1579,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) bond_compute_features(bond); + write_unlock_bh(&bond->lock); + + read_lock(&bond->lock); + new_slave->last_arp_rx = jiffies; if (bond->params.miimon && !bond->params.use_carrier) { @@ -1574,6 +1659,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) } } + write_lock_bh(&bond->curr_slave_lock); + switch (bond->params.mode) { case BOND_MODE_ACTIVEBACKUP: bond_set_slave_inactive_flags(new_slave); @@ -1621,9 +1708,11 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) break; } /* switch(bond_mode) */ + write_unlock_bh(&bond->curr_slave_lock); + bond_set_carrier(bond); - write_unlock_bh(&bond->lock); + read_unlock(&bond->lock); res = bond_create_slave_symlinks(bond_dev, slave_dev); if (res) @@ -1647,6 +1736,10 @@ err_unset_master: err_restore_mac: if (!bond->params.fail_over_mac) { + /* XXX TODO - fom follow mode needs to change master's + * MAC if this slave's MAC is in use by the bond, or at + * least print a warning. + */ memcpy(addr.sa_data, new_slave->perm_hwaddr, ETH_ALEN); addr.sa_family = slave_dev->type; dev_set_mac_address(slave_dev, &addr); @@ -1701,20 +1794,18 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev) return -EINVAL; } - mac_addr_differ = memcmp(bond_dev->dev_addr, - slave->perm_hwaddr, - ETH_ALEN); - if (!mac_addr_differ && (bond->slave_cnt > 1)) { - printk(KERN_WARNING DRV_NAME - ": %s: Warning: the permanent HWaddr of %s - " - "%s - is still in use by %s. " - "Set the HWaddr of %s to a different address " - "to avoid conflicts.\n", - bond_dev->name, - slave_dev->name, - print_mac(mac, slave->perm_hwaddr), - bond_dev->name, - slave_dev->name); + if (!bond->params.fail_over_mac) { + mac_addr_differ = memcmp(bond_dev->dev_addr, slave->perm_hwaddr, + ETH_ALEN); + if (!mac_addr_differ && (bond->slave_cnt > 1)) + printk(KERN_WARNING DRV_NAME + ": %s: Warning: the permanent HWaddr of %s - " + "%s - is still in use by %s. " + "Set the HWaddr of %s to a different address " + "to avoid conflicts.\n", + bond_dev->name, slave_dev->name, + print_mac(mac, slave->perm_hwaddr), + bond_dev->name, slave_dev->name); } /* Inform AD package of unbinding of slave. */ @@ -1841,7 +1932,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev) /* close slave before restoring its mac address */ dev_close(slave_dev); - if (!bond->params.fail_over_mac) { + if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) { /* restore original ("permanent") mac address */ memcpy(addr.sa_data, slave->perm_hwaddr, ETH_ALEN); addr.sa_family = slave_dev->type; @@ -3164,7 +3255,8 @@ static void bond_info_show_master(struct seq_file *seq) if (bond->params.mode == BOND_MODE_ACTIVEBACKUP && bond->params.fail_over_mac) - seq_printf(seq, " (fail_over_mac)"); + seq_printf(seq, " (fail_over_mac %s)", + fail_over_mac_tbl[bond->params.fail_over_mac].modename); seq_printf(seq, "\n"); @@ -4092,10 +4184,10 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr) dprintk("bond=%p, name=%s\n", bond, (bond_dev ? bond_dev->name : "None")); /* - * If fail_over_mac is enabled, do nothing and return success. - * Returning an error causes ifenslave to fail. + * If fail_over_mac is set to active, do nothing and return + * success. Returning an error causes ifenslave to fail. */ - if (bond->params.fail_over_mac) + if (bond->params.fail_over_mac == BOND_FOM_ACTIVE) return 0; if (!is_valid_ether_addr(sa->sa_data)) { @@ -4600,7 +4692,7 @@ int bond_parse_parm(const char *buf, struct bond_parm_tbl *tbl) static int bond_check_params(struct bond_params *params) { - int arp_validate_value; + int arp_validate_value, fail_over_mac_value; /* * Convert string parameters. @@ -4875,10 +4967,23 @@ static int bond_check_params(struct bond_params *params) primary = NULL; } - if (fail_over_mac && (bond_mode != BOND_MODE_ACTIVEBACKUP)) - printk(KERN_WARNING DRV_NAME - ": Warning: fail_over_mac only affects " - "active-backup mode.\n"); + if (fail_over_mac) { + fail_over_mac_value = bond_parse_parm(fail_over_mac, + fail_over_mac_tbl); + if (fail_over_mac_value == -1) { + printk(KERN_ERR DRV_NAME + ": Error: invalid fail_over_mac \"%s\"\n", + arp_validate == NULL ? "NULL" : arp_validate); + return -EINVAL; + } + + if (bond_mode != BOND_MODE_ACTIVEBACKUP) + printk(KERN_WARNING DRV_NAME + ": Warning: fail_over_mac only affects " + "active-backup mode.\n"); + } else { + fail_over_mac_value = BOND_FOM_NONE; + } /* fill params struct with the proper values */ params->mode = bond_mode; @@ -4892,7 +4997,7 @@ static int bond_check_params(struct bond_params *params) params->use_carrier = use_carrier; params->lacp_fast = lacp_fast; params->primary[0] = 0; - params->fail_over_mac = fail_over_mac; + params->fail_over_mac = fail_over_mac_value; if (primary) { strncpy(params->primary, primary, IFNAMSIZ); diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 7a61e9a14386..dd265c69b0df 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -50,6 +50,7 @@ extern struct bond_parm_tbl bond_mode_tbl[]; extern struct bond_parm_tbl bond_lacp_tbl[]; extern struct bond_parm_tbl xmit_hashtype_tbl[]; extern struct bond_parm_tbl arp_validate_tbl[]; +extern struct bond_parm_tbl fail_over_mac_tbl[]; static int expected_refcount = -1; static struct class *netdev_class; @@ -547,42 +548,37 @@ static ssize_t bonding_show_fail_over_mac(struct device *d, struct device_attrib { struct bonding *bond = to_bond(d); - return sprintf(buf, "%d\n", bond->params.fail_over_mac) + 1; + return sprintf(buf, "%s %d\n", + fail_over_mac_tbl[bond->params.fail_over_mac].modename, + bond->params.fail_over_mac); } static ssize_t bonding_store_fail_over_mac(struct device *d, struct device_attribute *attr, const char *buf, size_t count) { int new_value; - int ret = count; struct bonding *bond = to_bond(d); if (bond->slave_cnt != 0) { printk(KERN_ERR DRV_NAME ": %s: Can't alter fail_over_mac with slaves in bond.\n", bond->dev->name); - ret = -EPERM; - goto out; + return -EPERM; } - if (sscanf(buf, "%d", &new_value) != 1) { + new_value = bond_parse_parm(buf, fail_over_mac_tbl); + if (new_value < 0) { printk(KERN_ERR DRV_NAME - ": %s: no fail_over_mac value specified.\n", - bond->dev->name); - ret = -EINVAL; - goto out; + ": %s: Ignoring invalid fail_over_mac value %s.\n", + bond->dev->name, buf); + return -EINVAL; } - if ((new_value == 0) || (new_value == 1)) { - bond->params.fail_over_mac = new_value; - printk(KERN_INFO DRV_NAME ": %s: Setting fail_over_mac to %d.\n", - bond->dev->name, new_value); - } else { - printk(KERN_INFO DRV_NAME - ": %s: Ignoring invalid fail_over_mac value %d.\n", - bond->dev->name, new_value); - } -out: - return ret; + bond->params.fail_over_mac = new_value; + printk(KERN_INFO DRV_NAME ": %s: Setting fail_over_mac to %s (%d).\n", + bond->dev->name, fail_over_mac_tbl[new_value].modename, + new_value); + + return count; } static DEVICE_ATTR(fail_over_mac, S_IRUGO | S_IWUSR, bonding_show_fail_over_mac, bonding_store_fail_over_mac); diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index 8766b1213690..89fd9963db7a 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -248,6 +248,10 @@ static inline struct bonding *bond_get_bond_by_slave(struct slave *slave) return (struct bonding *)slave->dev->master->priv; } +#define BOND_FOM_NONE 0 +#define BOND_FOM_ACTIVE 1 +#define BOND_FOM_FOLLOW 2 + #define BOND_ARP_VALIDATE_NONE 0 #define BOND_ARP_VALIDATE_ACTIVE (1 << BOND_STATE_ACTIVE) #define BOND_ARP_VALIDATE_BACKUP (1 << BOND_STATE_BACKUP) -- cgit v1.2.3-58-ga151 From 8a2f2ccc7a6bfbdb8b484198e190d6805d979700 Mon Sep 17 00:00:00 2001 From: Isaku Yamahata Date: Tue, 27 May 2008 15:16:47 -0700 Subject: [IA64] pvops: documentation on ia64/pv_ops Documentation on ia64/pv_ops which describes its strategy and implementation. Signed-off-by: Isaku Yamahata Cc: Gerald Pfeifer Signed-off-by: Tony Luck --- Documentation/ia64/paravirt_ops.txt | 137 ++++++++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 Documentation/ia64/paravirt_ops.txt (limited to 'Documentation') diff --git a/Documentation/ia64/paravirt_ops.txt b/Documentation/ia64/paravirt_ops.txt new file mode 100644 index 000000000000..39ded02ec33f --- /dev/null +++ b/Documentation/ia64/paravirt_ops.txt @@ -0,0 +1,137 @@ +Paravirt_ops on IA64 +==================== + 21 May 2008, Isaku Yamahata + + +Introduction +------------ +The aim of this documentation is to help with maintainability and/or to +encourage people to use paravirt_ops/IA64. + +paravirt_ops (pv_ops in short) is a way for virtualization support of +Linux kernel on x86. Several ways for virtualization support were +proposed, paravirt_ops is the winner. +On the other hand, now there are also several IA64 virtualization +technologies like kvm/IA64, xen/IA64 and many other academic IA64 +hypervisors so that it is good to add generic virtualization +infrastructure on Linux/IA64. + + +What is paravirt_ops? +--------------------- +It has been developed on x86 as virtualization support via API, not ABI. +It allows each hypervisor to override operations which are important for +hypervisors at API level. And it allows a single kernel binary to run on +all supported execution environments including native machine. +Essentially paravirt_ops is a set of function pointers which represent +operations corresponding to low level sensitive instructions and high +level functionalities in various area. But one significant difference +from usual function pointer table is that it allows optimization with +binary patch. It is because some of these operations are very +performance sensitive and indirect call overhead is not negligible. +With binary patch, indirect C function call can be transformed into +direct C function call or in-place execution to eliminate the overhead. + +Thus, operations of paravirt_ops are classified into three categories. +- simple indirect call + These operations correspond to high level functionality so that the + overhead of indirect call isn't very important. + +- indirect call which allows optimization with binary patch + Usually these operations correspond to low level instructions. They + are called frequently and performance critical. So the overhead is + very important. + +- a set of macros for hand written assembly code + Hand written assembly codes (.S files) also need paravirtualization + because they include sensitive instructions or some of code paths in + them are very performance critical. + + +The relation to the IA64 machine vector +--------------------------------------- +Linux/IA64 has the IA64 machine vector functionality which allows the +kernel to switch implementations (e.g. initialization, ipi, dma api...) +depending on executing platform. +We can replace some implementations very easily defining a new machine +vector. Thus another approach for virtualization support would be +enhancing the machine vector functionality. +But paravirt_ops approach was taken because +- virtualization support needs wider support than machine vector does. + e.g. low level instruction paravirtualization. It must be + initialized very early before platform detection. + +- virtualization support needs more functionality like binary patch. + Probably the calling overhead might not be very large compared to the + emulation overhead of virtualization. However in the native case, the + overhead should be eliminated completely. + A single kernel binary should run on each environment including native, + and the overhead of paravirt_ops on native environment should be as + small as possible. + +- for full virtualization technology, e.g. KVM/IA64 or + Xen/IA64 HVM domain, the result would be + (the emulated platform machine vector. probably dig) + (pv_ops). + This means that the virtualization support layer should be under + the machine vector layer. + +Possibly it might be better to move some function pointers from +paravirt_ops to machine vector. In fact, Xen domU case utilizes both +pv_ops and machine vector. + + +IA64 paravirt_ops +----------------- +In this section, the concrete paravirt_ops will be discussed. +Because of the architecture difference between ia64 and x86, the +resulting set of functions is very different from x86 pv_ops. + +- C function pointer tables +They are not very performance critical so that simple C indirect +function call is acceptable. The following structures are defined at +this moment. For details see linux/include/asm-ia64/paravirt.h + - struct pv_info + This structure describes the execution environment. + - struct pv_init_ops + This structure describes the various initialization hooks. + - struct pv_iosapic_ops + This structure describes hooks to iosapic operations. + - struct pv_irq_ops + This structure describes hooks to irq related operations + - struct pv_time_op + This structure describes hooks to steal time accounting. + +- a set of indirect calls which need optimization +Currently this class of functions correspond to a subset of IA64 +intrinsics. At this moment the optimization with binary patch isn't +implemented yet. +struct pv_cpu_op is defined. For details see +linux/include/asm-ia64/paravirt_privop.h +Mostly they correspond to ia64 intrinsics 1-to-1. +Caveat: Now they are defined as C indirect function pointers, but in +order to support binary patch optimization, they will be changed +using GCC extended inline assembly code. + +- a set of macros for hand written assembly code (.S files) +For maintenance purpose, the taken approach for .S files is single +source code and compile multiple times with different macros definitions. +Each pv_ops instance must define those macros to compile. +The important thing here is that sensitive, but non-privileged +instructions must be paravirtualized and that some privileged +instructions also need paravirtualization for reasonable performance. +Developers who modify .S files must be aware of that. At this moment +an easy checker is implemented to detect paravirtualization breakage. +But it doesn't cover all the cases. + +Sometimes this set of macros is called pv_cpu_asm_op. But there is no +corresponding structure in the source code. +Those macros mostly 1:1 correspond to a subset of privileged +instructions. See linux/include/asm-ia64/native/inst.h. +And some functions written in assembly also need to be overrided so +that each pv_ops instance have to define some macros. Again see +linux/include/asm-ia64/native/inst.h. + + +Those structures must be initialized very early before start_kernel. +Probably initialized in head.S using multi entry point or some other trick. +For native case implementation see linux/arch/ia64/kernel/paravirt.c. -- cgit v1.2.3-58-ga151 From a5edeccb1a8432ae5d9fb9bccea5a4b64c565017 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Mon, 26 May 2008 11:53:21 +0200 Subject: net: OpenFirmware GPIO based MDIO bitbang driver This patch adds an MDIO bitbang driver that uses the GPIO library and its OF bindings to access the bus I/Os. Signed-off-by: Laurent Pinchart Signed-off-by: Jeff Garzik --- Documentation/powerpc/booting-without-of.txt | 21 +++ drivers/net/phy/Kconfig | 6 + drivers/net/phy/Makefile | 1 + drivers/net/phy/mdio-ofgpio.c | 205 +++++++++++++++++++++++++++ 4 files changed, 233 insertions(+) create mode 100644 drivers/net/phy/mdio-ofgpio.c (limited to 'Documentation') diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index 1d2a772506cf..46a9dba11f2f 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -58,6 +58,7 @@ Table of Contents o) Xilinx IP cores p) Freescale Synchronous Serial Interface q) USB EHCI controllers + r) MDIO on GPIOs VII - Marvell Discovery mv64[345]6x System Controller chips 1) The /system-controller node @@ -2870,6 +2871,26 @@ platforms are moved over to use the flattened-device-tree model. reg = <0xe8000000 32>; }; + r) MDIO on GPIOs + + Currently defined compatibles: + - virtual,gpio-mdio + + MDC and MDIO lines connected to GPIO controllers are listed in the + gpios property as described in section VIII.1 in the following order: + + MDC, MDIO. + + Example: + + mdio { + compatible = "virtual,mdio-gpio"; + #address-cells = <1>; + #size-cells = <0>; + gpios = <&qe_pio_a 11 + &qe_pio_c 6>; + }; + VII - Marvell Discovery mv64[345]6x System Controller chips =========================================================== diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index 284217c12839..d55932acd887 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -84,4 +84,10 @@ config MDIO_BITBANG If in doubt, say N. +config MDIO_OF_GPIO + tristate "Support for GPIO lib-based bitbanged MDIO buses" + depends on MDIO_BITBANG && OF_GPIO + ---help--- + Supports GPIO lib-based MDIO busses. + endif # PHYLIB diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 5997d6ef702b..eee329fa6f53 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -15,3 +15,4 @@ obj-$(CONFIG_ICPLUS_PHY) += icplus.o obj-$(CONFIG_REALTEK_PHY) += realtek.o obj-$(CONFIG_FIXED_PHY) += fixed.o obj-$(CONFIG_MDIO_BITBANG) += mdio-bitbang.o +obj-$(CONFIG_MDIO_OF_GPIO) += mdio-ofgpio.o diff --git a/drivers/net/phy/mdio-ofgpio.c b/drivers/net/phy/mdio-ofgpio.c new file mode 100644 index 000000000000..7edfc0c34835 --- /dev/null +++ b/drivers/net/phy/mdio-ofgpio.c @@ -0,0 +1,205 @@ +/* + * OpenFirmware GPIO based MDIO bitbang driver. + * + * Copyright (c) 2008 CSE Semaphore Belgium. + * by Laurent Pinchart + * + * Based on earlier work by + * + * Copyright (c) 2003 Intracom S.A. + * by Pantelis Antoniou + * + * 2005 (c) MontaVista Software, Inc. + * Vitaly Bordug + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed "as is" without any warranty of any + * kind, whether express or implied. + */ + +#include +#include +#include +#include +#include +#include +#include + +struct mdio_gpio_info { + struct mdiobb_ctrl ctrl; + int mdc, mdio; +}; + +static void mdio_dir(struct mdiobb_ctrl *ctrl, int dir) +{ + struct mdio_gpio_info *bitbang = + container_of(ctrl, struct mdio_gpio_info, ctrl); + + if (dir) + gpio_direction_output(bitbang->mdio, 1); + else + gpio_direction_input(bitbang->mdio); +} + +static int mdio_read(struct mdiobb_ctrl *ctrl) +{ + struct mdio_gpio_info *bitbang = + container_of(ctrl, struct mdio_gpio_info, ctrl); + + return gpio_get_value(bitbang->mdio); +} + +static void mdio(struct mdiobb_ctrl *ctrl, int what) +{ + struct mdio_gpio_info *bitbang = + container_of(ctrl, struct mdio_gpio_info, ctrl); + + gpio_set_value(bitbang->mdio, what); +} + +static void mdc(struct mdiobb_ctrl *ctrl, int what) +{ + struct mdio_gpio_info *bitbang = + container_of(ctrl, struct mdio_gpio_info, ctrl); + + gpio_set_value(bitbang->mdc, what); +} + +static struct mdiobb_ops mdio_gpio_ops = { + .owner = THIS_MODULE, + .set_mdc = mdc, + .set_mdio_dir = mdio_dir, + .set_mdio_data = mdio, + .get_mdio_data = mdio_read, +}; + +static int __devinit mdio_ofgpio_bitbang_init(struct mii_bus *bus, + struct device_node *np) +{ + struct mdio_gpio_info *bitbang = bus->priv; + + bitbang->mdc = of_get_gpio(np, 0); + bitbang->mdio = of_get_gpio(np, 1); + + if (bitbang->mdc < 0 || bitbang->mdio < 0) + return -ENODEV; + + snprintf(bus->id, MII_BUS_ID_SIZE, "%x", bitbang->mdc); + return 0; +} + +static void __devinit add_phy(struct mii_bus *bus, struct device_node *np) +{ + const u32 *data; + int len, id, irq; + + data = of_get_property(np, "reg", &len); + if (!data || len != 4) + return; + + id = *data; + bus->phy_mask &= ~(1 << id); + + irq = of_irq_to_resource(np, 0, NULL); + if (irq != NO_IRQ) + bus->irq[id] = irq; +} + +static int __devinit mdio_ofgpio_probe(struct of_device *ofdev, + const struct of_device_id *match) +{ + struct device_node *np = NULL; + struct mii_bus *new_bus; + struct mdio_gpio_info *bitbang; + int ret = -ENOMEM; + int i; + + bitbang = kzalloc(sizeof(struct mdio_gpio_info), GFP_KERNEL); + if (!bitbang) + goto out; + + bitbang->ctrl.ops = &mdio_gpio_ops; + + new_bus = alloc_mdio_bitbang(&bitbang->ctrl); + if (!new_bus) + goto out_free_priv; + + new_bus->name = "GPIO Bitbanged MII", + + ret = mdio_ofgpio_bitbang_init(new_bus, ofdev->node); + if (ret) + goto out_free_bus; + + new_bus->phy_mask = ~0; + new_bus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL); + if (!new_bus->irq) + goto out_free_bus; + + for (i = 0; i < PHY_MAX_ADDR; i++) + new_bus->irq[i] = -1; + + while ((np = of_get_next_child(ofdev->node, np))) + if (!strcmp(np->type, "ethernet-phy")) + add_phy(new_bus, np); + + new_bus->dev = &ofdev->dev; + dev_set_drvdata(&ofdev->dev, new_bus); + + ret = mdiobus_register(new_bus); + if (ret) + goto out_free_irqs; + + return 0; + +out_free_irqs: + dev_set_drvdata(&ofdev->dev, NULL); + kfree(new_bus->irq); +out_free_bus: + kfree(new_bus); +out_free_priv: + free_mdio_bitbang(new_bus); +out: + return ret; +} + +static int mdio_ofgpio_remove(struct of_device *ofdev) +{ + struct mii_bus *bus = dev_get_drvdata(&ofdev->dev); + struct mdio_gpio_info *bitbang = bus->priv; + + mdiobus_unregister(bus); + free_mdio_bitbang(bus); + dev_set_drvdata(&ofdev->dev, NULL); + kfree(bus->irq); + kfree(bitbang); + kfree(bus); + + return 0; +} + +static struct of_device_id mdio_ofgpio_match[] = { + { + .compatible = "virtual,mdio-gpio", + }, + {}, +}; + +static struct of_platform_driver mdio_ofgpio_driver = { + .name = "mdio-gpio", + .match_table = mdio_ofgpio_match, + .probe = mdio_ofgpio_probe, + .remove = mdio_ofgpio_remove, +}; + +static int mdio_ofgpio_init(void) +{ + return of_register_platform_driver(&mdio_ofgpio_driver); +} + +static void mdio_ofgpio_exit(void) +{ + of_unregister_platform_driver(&mdio_ofgpio_driver); +} + +module_init(mdio_ofgpio_init); +module_exit(mdio_ofgpio_exit); -- cgit v1.2.3-58-ga151 From bb71ad880204b79d60331d3384103976e086cb9f Mon Sep 17 00:00:00 2001 From: Gary Hade Date: Mon, 12 May 2008 13:57:46 -0700 Subject: PCI: boot parameter to avoid expansion ROM memory allocation Contention for scarce PCI memory resources has been growing due to an increasing number of PCI slots in large multi-node systems. The kernel currently attempts by default to allocate memory for all PCI expansion ROMs so there has also been an increasing number of PCI memory allocation failures seen on these systems. This occurs because the BIOS either (1) provides insufficient PCI memory resource for all the expansion ROMs or (2) provides adequate PCI memory resource for expansion ROMs but provides the space in kernel unexpected BIOS assigned P2P non-prefetch windows. The resulting PCI memory allocation failures may be benign when related to memory requests for expansion ROMs themselves but in some cases they can occur when attempting to allocate space for more critical BARs. This can happen when a successful expansion ROM allocation request consumes memory resource that was intended for a non-ROM BAR. We have seen this happen during PCI hotplug of an adapter that contains a P2P bridge where successful memory allocation for an expansion ROM BAR on device behind the bridge consumed memory that was intended for a non-ROM BAR on the P2P bridge. In all cases the allocation failure messages can be very confusing for users. This patch provides a new 'pci=norom' kernel boot parameter that can be used to disable the default PCI expansion ROM memory resource allocation. This provides a way to avoid the above described issues on systems that do not contain PCI devices for which drivers or user-level applications depend on the default PCI expansion ROM memory resource allocation behavior. Signed-off-by: Gary Hade Signed-off-by: Jesse Barnes --- Documentation/kernel-parameters.txt | 3 +++ arch/x86/pci/common.c | 22 ++++++++++++++++++++++ arch/x86/pci/pci.h | 1 + 3 files changed, 26 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e07c432c731f..9cf7b34f2db0 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1496,6 +1496,9 @@ and is between 256 and 4096 characters. It is defined in the file Use with caution as certain devices share address decoders between ROMs and other resources. + norom [X86-32,X86_64] Do not assign address space to + expansion ROMs that do not already have + BIOS assigned address ranges. irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be assigned automatically to PCI devices. You can make the kernel exclude IRQs of your ISA cards diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 6e64aaf00d1d..3a5261bdff5d 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -121,6 +121,21 @@ void __init dmi_check_skip_isa_align(void) dmi_check_system(can_skip_pciprobe_dmi_table); } +static void __devinit pcibios_fixup_device_resources(struct pci_dev *dev) +{ + struct resource *rom_r = &dev->resource[PCI_ROM_RESOURCE]; + + if (pci_probe & PCI_NOASSIGN_ROMS) { + if (rom_r->parent) + return; + if (rom_r->start) { + /* we deal with BIOS assigned ROM later */ + return; + } + rom_r->start = rom_r->end = rom_r->flags = 0; + } +} + /* * Called after each bus is probed, but before its children * are examined. @@ -128,7 +143,11 @@ void __init dmi_check_skip_isa_align(void) void __devinit pcibios_fixup_bus(struct pci_bus *b) { + struct pci_dev *dev; + pci_read_bridge_bases(b); + list_for_each_entry(dev, &b->devices, bus_list) + pcibios_fixup_device_resources(dev); } /* @@ -483,6 +502,9 @@ char * __devinit pcibios_setup(char *str) else if (!strcmp(str, "rom")) { pci_probe |= PCI_ASSIGN_ROMS; return NULL; + } else if (!strcmp(str, "norom")) { + pci_probe |= PCI_NOASSIGN_ROMS; + return NULL; } else if (!strcmp(str, "assign-busses")) { pci_probe |= PCI_ASSIGN_ALL_BUSSES; return NULL; diff --git a/arch/x86/pci/pci.h b/arch/x86/pci/pci.h index 720c4c554534..291dafec07b7 100644 --- a/arch/x86/pci/pci.h +++ b/arch/x86/pci/pci.h @@ -27,6 +27,7 @@ #define PCI_CAN_SKIP_ISA_ALIGN 0x8000 #define PCI_USE__CRS 0x10000 #define PCI_CHECK_ENABLE_AMD_MMCONF 0x20000 +#define PCI_NOASSIGN_ROMS 0x40000 extern unsigned int pci_probe; extern unsigned long pirq_table_addr; -- cgit v1.2.3-58-ga151 From d8f3de0d2412bb91639cfefc5b3c79dbf3812212 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 12 Jun 2008 23:24:06 +0200 Subject: Suspend-related patches for 2.6.27 ACPI PM: Add possibility to change suspend sequence There are some systems out there that don't work correctly with our current suspend/hibernation code ordering. Provide a workaround for these systems allowing them to pass 'acpi_sleep=old_ordering' in the kernel command line so that it will use the pre-ACPI 2.0 ("old") suspend code ordering. Unfortunately, this requires us to add a platform hook to the resuming of devices for recovering the platform in case one of the device drivers' .suspend() routines returns error code. Namely, ACPI 1.0 specifies that _PTS should be called before suspending devices, but _WAK still should be called before resuming them in order to undo the changes made by _PTS. However, if there is an error during suspending devices, they are automatically resumed without returning control to the PM core, so the _WAK has to be called from within device_resume() in that cases. The patch also reorders and refactors the ACPI suspend/hibernation code to avoid duplication as far as reasonably possible. Signed-off-by: Rafael J. Wysocki Acked-by: Pavel Machek Signed-off-by: Jesse Barnes --- Documentation/kernel-parameters.txt | 6 +- arch/x86/kernel/acpi/sleep.c | 2 + drivers/acpi/sleep/main.c | 276 +++++++++++++++++++++--------------- drivers/base/power/main.c | 2 - include/linux/acpi.h | 3 + include/linux/suspend.h | 14 +- kernel/power/disk.c | 28 +++- kernel/power/main.c | 10 +- 8 files changed, 215 insertions(+), 126 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9cf7b34f2db0..18d793ea0dd3 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -147,10 +147,14 @@ and is between 256 and 4096 characters. It is defined in the file default: 0 acpi_sleep= [HW,ACPI] Sleep options - Format: { s3_bios, s3_mode, s3_beep } + Format: { s3_bios, s3_mode, s3_beep, old_ordering } See Documentation/power/video.txt for s3_bios and s3_mode. s3_beep is for debugging; it makes the PC's speaker beep as soon as the kernel's real-mode entry point is called. + old_ordering causes the ACPI 1.0 ordering of the _PTS + control method, wrt putting devices into low power + states, to be enforced (the ACPI 2.0 ordering of _PTS is + used by default). acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode Format: { level | edge | high | low } diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index afc25ee9964b..882e970032d5 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -124,6 +124,8 @@ static int __init acpi_sleep_setup(char *str) acpi_realmode_flags |= 2; if (strncmp(str, "s3_beep", 7) == 0) acpi_realmode_flags |= 4; + if (strncmp(str, "old_ordering", 12) == 0) + acpi_old_suspend_ordering(); str = strchr(str, ','); if (str != NULL) str += strspn(str, ", \t"); diff --git a/drivers/acpi/sleep/main.c b/drivers/acpi/sleep/main.c index 0f2caea2fc83..4addf8ad50ae 100644 --- a/drivers/acpi/sleep/main.c +++ b/drivers/acpi/sleep/main.c @@ -24,10 +24,6 @@ u8 sleep_states[ACPI_S_STATE_COUNT]; -#ifdef CONFIG_PM_SLEEP -static u32 acpi_target_sleep_state = ACPI_STATE_S0; -#endif - static int acpi_sleep_prepare(u32 acpi_state) { #ifdef CONFIG_ACPI_SLEEP @@ -50,9 +46,96 @@ static int acpi_sleep_prepare(u32 acpi_state) return 0; } -#ifdef CONFIG_SUSPEND -static struct platform_suspend_ops acpi_suspend_ops; +#ifdef CONFIG_PM_SLEEP +static u32 acpi_target_sleep_state = ACPI_STATE_S0; + +/* + * ACPI 1.0 wants us to execute _PTS before suspending devices, so we allow the + * user to request that behavior by using the 'acpi_old_suspend_ordering' + * kernel command line option that causes the following variable to be set. + */ +static bool old_suspend_ordering; + +void __init acpi_old_suspend_ordering(void) +{ + old_suspend_ordering = true; +} + +/** + * acpi_pm_disable_gpes - Disable the GPEs. + */ +static int acpi_pm_disable_gpes(void) +{ + acpi_hw_disable_all_gpes(); + return 0; +} + +/** + * __acpi_pm_prepare - Prepare the platform to enter the target state. + * + * If necessary, set the firmware waking vector and do arch-specific + * nastiness to get the wakeup code to the waking vector. + */ +static int __acpi_pm_prepare(void) +{ + int error = acpi_sleep_prepare(acpi_target_sleep_state); + + if (error) + acpi_target_sleep_state = ACPI_STATE_S0; + return error; +} + +/** + * acpi_pm_prepare - Prepare the platform to enter the target sleep + * state and disable the GPEs. + */ +static int acpi_pm_prepare(void) +{ + int error = __acpi_pm_prepare(); + + if (!error) + acpi_hw_disable_all_gpes(); + return error; +} + +/** + * acpi_pm_finish - Instruct the platform to leave a sleep state. + * + * This is called after we wake back up (or if entering the sleep state + * failed). + */ +static void acpi_pm_finish(void) +{ + u32 acpi_state = acpi_target_sleep_state; + + if (acpi_state == ACPI_STATE_S0) + return; + printk(KERN_INFO PREFIX "Waking up from system sleep state S%d\n", + acpi_state); + acpi_disable_wakeup_device(acpi_state); + acpi_leave_sleep_state(acpi_state); + + /* reset firmware waking vector */ + acpi_set_firmware_waking_vector((acpi_physical_address) 0); + + acpi_target_sleep_state = ACPI_STATE_S0; +} + +/** + * acpi_pm_end - Finish up suspend sequence. + */ +static void acpi_pm_end(void) +{ + /* + * This is necessary in case acpi_pm_finish() is not called during a + * failing transition to a sleep state. + */ + acpi_target_sleep_state = ACPI_STATE_S0; +} +#endif /* CONFIG_PM_SLEEP */ + +#ifdef CONFIG_SUSPEND extern void do_suspend_lowlevel(void); static u32 acpi_suspend_states[] = { @@ -66,7 +149,6 @@ static u32 acpi_suspend_states[] = { * acpi_suspend_begin - Set the target system sleep state to the state * associated with given @pm_state, if supported. */ - static int acpi_suspend_begin(suspend_state_t pm_state) { u32 acpi_state = acpi_suspend_states[pm_state]; @@ -82,25 +164,6 @@ static int acpi_suspend_begin(suspend_state_t pm_state) return error; } -/** - * acpi_suspend_prepare - Do preliminary suspend work. - * - * If necessary, set the firmware waking vector and do arch-specific - * nastiness to get the wakeup code to the waking vector. - */ - -static int acpi_suspend_prepare(void) -{ - int error = acpi_sleep_prepare(acpi_target_sleep_state); - - if (error) { - acpi_target_sleep_state = ACPI_STATE_S0; - return error; - } - - return ACPI_SUCCESS(acpi_hw_disable_all_gpes()) ? 0 : -EFAULT; -} - /** * acpi_suspend_enter - Actually enter a sleep state. * @pm_state: ignored @@ -109,7 +172,6 @@ static int acpi_suspend_prepare(void) * assembly, which in turn call acpi_enter_sleep_state(). * It's unfortunate, but it works. Please fix if you're feeling frisky. */ - static int acpi_suspend_enter(suspend_state_t pm_state) { acpi_status status = AE_OK; @@ -166,39 +228,6 @@ static int acpi_suspend_enter(suspend_state_t pm_state) return ACPI_SUCCESS(status) ? 0 : -EFAULT; } -/** - * acpi_suspend_finish - Instruct the platform to leave a sleep state. - * - * This is called after we wake back up (or if entering the sleep state - * failed). - */ - -static void acpi_suspend_finish(void) -{ - u32 acpi_state = acpi_target_sleep_state; - - acpi_disable_wakeup_device(acpi_state); - acpi_leave_sleep_state(acpi_state); - - /* reset firmware waking vector */ - acpi_set_firmware_waking_vector((acpi_physical_address) 0); - - acpi_target_sleep_state = ACPI_STATE_S0; -} - -/** - * acpi_suspend_end - Finish up suspend sequence. - */ - -static void acpi_suspend_end(void) -{ - /* - * This is necessary in case acpi_suspend_finish() is not called during a - * failing transition to a sleep state. - */ - acpi_target_sleep_state = ACPI_STATE_S0; -} - static int acpi_suspend_state_valid(suspend_state_t pm_state) { u32 acpi_state; @@ -218,10 +247,39 @@ static int acpi_suspend_state_valid(suspend_state_t pm_state) static struct platform_suspend_ops acpi_suspend_ops = { .valid = acpi_suspend_state_valid, .begin = acpi_suspend_begin, - .prepare = acpi_suspend_prepare, + .prepare = acpi_pm_prepare, + .enter = acpi_suspend_enter, + .finish = acpi_pm_finish, + .end = acpi_pm_end, +}; + +/** + * acpi_suspend_begin_old - Set the target system sleep state to the + * state associated with given @pm_state, if supported, and + * execute the _PTS control method. This function is used if the + * pre-ACPI 2.0 suspend ordering has been requested. + */ +static int acpi_suspend_begin_old(suspend_state_t pm_state) +{ + int error = acpi_suspend_begin(pm_state); + + if (!error) + error = __acpi_pm_prepare(); + return error; +} + +/* + * The following callbacks are used if the pre-ACPI 2.0 suspend ordering has + * been requested. + */ +static struct platform_suspend_ops acpi_suspend_ops_old = { + .valid = acpi_suspend_state_valid, + .begin = acpi_suspend_begin_old, + .prepare = acpi_pm_disable_gpes, .enter = acpi_suspend_enter, - .finish = acpi_suspend_finish, - .end = acpi_suspend_end, + .finish = acpi_pm_finish, + .end = acpi_pm_end, + .recover = acpi_pm_finish, }; #endif /* CONFIG_SUSPEND */ @@ -229,22 +287,9 @@ static struct platform_suspend_ops acpi_suspend_ops = { static int acpi_hibernation_begin(void) { acpi_target_sleep_state = ACPI_STATE_S4; - return 0; } -static int acpi_hibernation_prepare(void) -{ - int error = acpi_sleep_prepare(ACPI_STATE_S4); - - if (error) { - acpi_target_sleep_state = ACPI_STATE_S0; - return error; - } - - return ACPI_SUCCESS(acpi_hw_disable_all_gpes()) ? 0 : -EFAULT; -} - static int acpi_hibernation_enter(void) { acpi_status status = AE_OK; @@ -274,52 +319,55 @@ static void acpi_hibernation_leave(void) acpi_leave_sleep_state_prep(ACPI_STATE_S4); } -static void acpi_hibernation_finish(void) +static void acpi_pm_enable_gpes(void) { - acpi_disable_wakeup_device(ACPI_STATE_S4); - acpi_leave_sleep_state(ACPI_STATE_S4); - - /* reset firmware waking vector */ - acpi_set_firmware_waking_vector((acpi_physical_address) 0); - - acpi_target_sleep_state = ACPI_STATE_S0; + acpi_hw_enable_all_runtime_gpes(); } -static void acpi_hibernation_end(void) -{ - /* - * This is necessary in case acpi_hibernation_finish() is not called - * during a failing transition to the sleep state. - */ - acpi_target_sleep_state = ACPI_STATE_S0; -} +static struct platform_hibernation_ops acpi_hibernation_ops = { + .begin = acpi_hibernation_begin, + .end = acpi_pm_end, + .pre_snapshot = acpi_pm_prepare, + .finish = acpi_pm_finish, + .prepare = acpi_pm_prepare, + .enter = acpi_hibernation_enter, + .leave = acpi_hibernation_leave, + .pre_restore = acpi_pm_disable_gpes, + .restore_cleanup = acpi_pm_enable_gpes, +}; -static int acpi_hibernation_pre_restore(void) +/** + * acpi_hibernation_begin_old - Set the target system sleep state to + * ACPI_STATE_S4 and execute the _PTS control method. This + * function is used if the pre-ACPI 2.0 suspend ordering has been + * requested. + */ +static int acpi_hibernation_begin_old(void) { - acpi_status status; - - status = acpi_hw_disable_all_gpes(); - - return ACPI_SUCCESS(status) ? 0 : -EFAULT; -} + int error = acpi_sleep_prepare(ACPI_STATE_S4); -static void acpi_hibernation_restore_cleanup(void) -{ - acpi_hw_enable_all_runtime_gpes(); + if (!error) + acpi_target_sleep_state = ACPI_STATE_S4; + return error; } -static struct platform_hibernation_ops acpi_hibernation_ops = { - .begin = acpi_hibernation_begin, - .end = acpi_hibernation_end, - .pre_snapshot = acpi_hibernation_prepare, - .finish = acpi_hibernation_finish, - .prepare = acpi_hibernation_prepare, +/* + * The following callbacks are used if the pre-ACPI 2.0 suspend ordering has + * been requested. + */ +static struct platform_hibernation_ops acpi_hibernation_ops_old = { + .begin = acpi_hibernation_begin_old, + .end = acpi_pm_end, + .pre_snapshot = acpi_pm_disable_gpes, + .finish = acpi_pm_finish, + .prepare = acpi_pm_disable_gpes, .enter = acpi_hibernation_enter, .leave = acpi_hibernation_leave, - .pre_restore = acpi_hibernation_pre_restore, - .restore_cleanup = acpi_hibernation_restore_cleanup, + .pre_restore = acpi_pm_disable_gpes, + .restore_cleanup = acpi_pm_enable_gpes, + .recover = acpi_pm_finish, }; -#endif /* CONFIG_HIBERNATION */ +#endif /* CONFIG_HIBERNATION */ int acpi_suspend(u32 acpi_state) { @@ -461,13 +509,15 @@ int __init acpi_sleep_init(void) } } - suspend_set_ops(&acpi_suspend_ops); + suspend_set_ops(old_suspend_ordering ? + &acpi_suspend_ops_old : &acpi_suspend_ops); #endif #ifdef CONFIG_HIBERNATION status = acpi_get_sleep_type_data(ACPI_STATE_S4, &type_a, &type_b); if (ACPI_SUCCESS(status)) { - hibernation_set_ops(&acpi_hibernation_ops); + hibernation_set_ops(old_suspend_ordering ? + &acpi_hibernation_ops_old : &acpi_hibernation_ops); sleep_states[ACPI_STATE_S4] = 1; printk(" S4"); } diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index d571204aaff7..3250c5257b74 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -781,8 +781,6 @@ int device_suspend(pm_message_t state) error = dpm_prepare(state); if (!error) error = dpm_suspend(state); - if (error) - device_resume(resume_event(state)); return error; } EXPORT_SYMBOL_GPL(device_suspend); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 41f7ce7edd7a..33adcf91ef41 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -234,6 +234,9 @@ int acpi_check_region(resource_size_t start, resource_size_t n, int acpi_check_mem_region(resource_size_t start, resource_size_t n, const char *name); +#ifdef CONFIG_PM_SLEEP +void __init acpi_old_suspend_ordering(void); +#endif /* CONFIG_PM_SLEEP */ #else /* CONFIG_ACPI */ static inline int early_acpi_boot_init(void) diff --git a/include/linux/suspend.h b/include/linux/suspend.h index a6977423baf7..e8e69159af71 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -86,6 +86,11 @@ typedef int __bitwise suspend_state_t; * that implement @begin(), but platforms implementing @begin() should * also provide a @end() which cleans up transitions aborted before * @enter(). + * + * @recover: Recover the platform from a suspend failure. + * Called by the PM core if the suspending of devices fails. + * This callback is optional and should only be implemented by platforms + * which require special recovery actions in that situation. */ struct platform_suspend_ops { int (*valid)(suspend_state_t state); @@ -94,6 +99,7 @@ struct platform_suspend_ops { int (*enter)(suspend_state_t state); void (*finish)(void); void (*end)(void); + void (*recover)(void); }; #ifdef CONFIG_SUSPEND @@ -149,7 +155,7 @@ extern void mark_free_pages(struct zone *zone); * The methods in this structure allow a platform to carry out special * operations required by it during a hibernation transition. * - * All the methods below must be implemented. + * All the methods below, except for @recover(), must be implemented. * * @begin: Tell the platform driver that we're starting hibernation. * Called right after shrinking memory and before freezing devices. @@ -189,6 +195,11 @@ extern void mark_free_pages(struct zone *zone); * @restore_cleanup: Clean up after a failing image restoration. * Called right after the nonboot CPUs have been enabled and before * thawing devices (runs with IRQs on). + * + * @recover: Recover the platform from a failure to suspend devices. + * Called by the PM core if the suspending of devices during hibernation + * fails. This callback is optional and should only be implemented by + * platforms which require special recovery actions in that situation. */ struct platform_hibernation_ops { int (*begin)(void); @@ -200,6 +211,7 @@ struct platform_hibernation_ops { void (*leave)(void); int (*pre_restore)(void); void (*restore_cleanup)(void); + void (*recover)(void); }; #ifdef CONFIG_HIBERNATION diff --git a/kernel/power/disk.c b/kernel/power/disk.c index d416be0efa8a..f011e0870b52 100644 --- a/kernel/power/disk.c +++ b/kernel/power/disk.c @@ -179,6 +179,17 @@ static void platform_restore_cleanup(int platform_mode) hibernation_ops->restore_cleanup(); } +/** + * platform_recover - recover the platform from a failure to suspend + * devices. + */ + +static void platform_recover(int platform_mode) +{ + if (platform_mode && hibernation_ops && hibernation_ops->recover) + hibernation_ops->recover(); +} + /** * create_image - freeze devices that need to be frozen with interrupts * off, create the hibernation image and thaw those devices. Control @@ -258,10 +269,10 @@ int hibernation_snapshot(int platform_mode) suspend_console(); error = device_suspend(PMSG_FREEZE); if (error) - goto Resume_console; + goto Recover_platform; if (hibernation_test(TEST_DEVICES)) - goto Resume_devices; + goto Recover_platform; error = platform_pre_snapshot(platform_mode); if (error || hibernation_test(TEST_PLATFORM)) @@ -285,11 +296,14 @@ int hibernation_snapshot(int platform_mode) Resume_devices: device_resume(in_suspend ? (error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE); - Resume_console: resume_console(); Close: platform_end(platform_mode); return error; + + Recover_platform: + platform_recover(platform_mode); + goto Resume_devices; } /** @@ -398,8 +412,11 @@ int hibernation_platform_enter(void) suspend_console(); error = device_suspend(PMSG_HIBERNATE); - if (error) - goto Resume_console; + if (error) { + if (hibernation_ops->recover) + hibernation_ops->recover(); + goto Resume_devices; + } error = hibernation_ops->prepare(); if (error) @@ -428,7 +445,6 @@ int hibernation_platform_enter(void) hibernation_ops->finish(); Resume_devices: device_resume(PMSG_RESTORE); - Resume_console: resume_console(); Close: hibernation_ops->end(); diff --git a/kernel/power/main.c b/kernel/power/main.c index d023b6b584e5..3398f4651aa1 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -269,11 +269,11 @@ int suspend_devices_and_enter(suspend_state_t state) error = device_suspend(PMSG_SUSPEND); if (error) { printk(KERN_ERR "PM: Some devices failed to suspend\n"); - goto Resume_console; + goto Recover_platform; } if (suspend_test(TEST_DEVICES)) - goto Resume_devices; + goto Recover_platform; if (suspend_ops->prepare) { error = suspend_ops->prepare(); @@ -294,12 +294,16 @@ int suspend_devices_and_enter(suspend_state_t state) suspend_ops->finish(); Resume_devices: device_resume(PMSG_RESUME); - Resume_console: resume_console(); Close: if (suspend_ops->end) suspend_ops->end(); return error; + + Recover_platform: + if (suspend_ops->recover) + suspend_ops->recover(); + goto Resume_devices; } /** -- cgit v1.2.3-58-ga151 From acc1e7a3007ec1940374206a84465c1e0cfcda09 Mon Sep 17 00:00:00 2001 From: Jouni Malinen Date: Wed, 11 Jun 2008 10:42:31 +0300 Subject: mac80211_hwsim: 802.11 radio simulator for mac80211 mac80211_hwsim is a Linux kernel module that can be used to simulate arbitrary number of IEEE 802.11 radios for mac80211 on a single device. It can be used to test most of the mac80211 functionality and user space tools (e.g., hostapd and wpa_supplicant) in a way that matches very closely with the normal case of using real WLAN hardware. From the mac80211 view point, mac80211_hwsim is yet another hardware driver, i.e., no changes to mac80211 are needed to use this testing tool. Signed-off-by: Jouni Malinen Acked-by: Johannes Berg Signed-off-by: John W. Linville --- Documentation/networking/mac80211_hwsim/README | 68 +++ .../networking/mac80211_hwsim/hostapd.conf | 11 + .../networking/mac80211_hwsim/wpa_supplicant.conf | 10 + drivers/net/wireless/Kconfig | 13 + drivers/net/wireless/Makefile | 2 + drivers/net/wireless/mac80211_hwsim.c | 533 +++++++++++++++++++++ 6 files changed, 637 insertions(+) create mode 100644 Documentation/networking/mac80211_hwsim/README create mode 100644 Documentation/networking/mac80211_hwsim/hostapd.conf create mode 100644 Documentation/networking/mac80211_hwsim/wpa_supplicant.conf create mode 100644 drivers/net/wireless/mac80211_hwsim.c (limited to 'Documentation') diff --git a/Documentation/networking/mac80211_hwsim/README b/Documentation/networking/mac80211_hwsim/README new file mode 100644 index 000000000000..2f6e90fcb5a7 --- /dev/null +++ b/Documentation/networking/mac80211_hwsim/README @@ -0,0 +1,68 @@ +mac80211_hwsim - software simulator of 802.11 radio(s) for mac80211 +Copyright (c) 2008, Jouni Malinen + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License version 2 as +published by the Free Software Foundation. + + +Introduction + +mac80211_hwsim is a Linux kernel module that can be used to simulate +arbitrary number of IEEE 802.11 radios for mac80211 on a single +device. It can be used to test most of the mac80211 functionality and +user space tools (e.g., hostapd and wpa_supplicant) in a way that +matches very closely with the normal case of using real WLAN +hardware. From the mac80211 view point, mac80211_hwsim is yet another +hardware driver, i.e., no changes to mac80211 are needed to use this +testing tool. + +The main goal for mac80211_hwsim is to make it easier for developers +to test their code and work with new features to mac80211, hostapd, +and wpa_supplicant. The simulated radios do not have the limitations +of real hardware, so it is easy to generate an arbitrary test setup +and always reproduce the same setup for future tests. In addition, +since all radio operation is simulated, any channel can be used in +tests regardless of regulatory rules. + +mac80211_hwsim kernel module has a parameter 'radios' that can be used +to select how many radios are simulates (default 2). This allows +configuration of both very simply setups (e.g., just a single access +point and a station) or large scale tests (multiple access points with +hundreds of stations). + +mac80211_hwsim works by tracking the current channel of each virtual +radio and copying all transmitted frames to all other radios that are +currently enabled and on the same channel as the transmitting +radio. Software encryption in mac80211 is used so that the frames are +actually encrypted over the virtual air interface to allow more +complete testing of encryption. + +A global monitoring netdev, hwsim#, is created independent of +mac80211. This interface can be used to monitor all transmitted frames +regardless of channel. + + +Simple example + +This example shows how to use mac80211_hwsim to simulate two radios: +one to act as an access point and the other as a station that +associates with the AP. hostapd and wpa_supplicant are used to take +care of WPA2-PSK authentication. In addition, hostapd is also +processing access point side of association. + +Please note that the current Linux kernel does not enable AP mode, so a +simple patch is needed to enable AP mode selection: +http://johannes.sipsolutions.net/patches/kernel/all/LATEST/006-allow-ap-vlan-modes.patch + + +# Build mac80211_hwsim as part of kernel configuration + +# Load the module +modprobe mac80211_hwsim + +# Run hostapd (AP) for wlan0 +hostapd hostapd.conf + +# Run wpa_supplicant (station) for wlan1 +wpa_supplicant -Dwext -iwlan1 -c wpa_supplicant.conf diff --git a/Documentation/networking/mac80211_hwsim/hostapd.conf b/Documentation/networking/mac80211_hwsim/hostapd.conf new file mode 100644 index 000000000000..08cde7e35f2e --- /dev/null +++ b/Documentation/networking/mac80211_hwsim/hostapd.conf @@ -0,0 +1,11 @@ +interface=wlan0 +driver=nl80211 + +hw_mode=g +channel=1 +ssid=mac80211 test + +wpa=2 +wpa_key_mgmt=WPA-PSK +wpa_pairwise=CCMP +wpa_passphrase=12345678 diff --git a/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf b/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf new file mode 100644 index 000000000000..299128cff035 --- /dev/null +++ b/Documentation/networking/mac80211_hwsim/wpa_supplicant.conf @@ -0,0 +1,10 @@ +ctrl_interface=/var/run/wpa_supplicant + +network={ + ssid="mac80211 test" + psk="12345678" + key_mgmt=WPA-PSK + proto=WPA2 + pairwise=CCMP + group=CCMP +} diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig index fdf5aa8b8429..22e1e9a1fb73 100644 --- a/drivers/net/wireless/Kconfig +++ b/drivers/net/wireless/Kconfig @@ -673,6 +673,19 @@ config ADM8211 Thanks to Infineon-ADMtek for their support of this driver. +config MAC80211_HWSIM + tristate "Simulated radio testing tool for mac80211" + depends on MAC80211 && WLAN_80211 + ---help--- + This driver is a developer testing tool that can be used to test + IEEE 802.11 networking stack (mac80211) functionality. This is not + needed for normal wireless LAN usage and is only for testing. See + Documentation/networking/mac80211_hwsim for more information on how + to use this tool. + + To compile this driver as a module, choose M here: the module will be + called mac80211_hwsim. If unsure, say N. + source "drivers/net/wireless/p54/Kconfig" source "drivers/net/wireless/ath5k/Kconfig" source "drivers/net/wireless/iwlwifi/Kconfig" diff --git a/drivers/net/wireless/Makefile b/drivers/net/wireless/Makefile index 2c343aae38d4..54a4f6f1db67 100644 --- a/drivers/net/wireless/Makefile +++ b/drivers/net/wireless/Makefile @@ -62,3 +62,5 @@ obj-$(CONFIG_RT2X00) += rt2x00/ obj-$(CONFIG_P54_COMMON) += p54/ obj-$(CONFIG_ATH5K) += ath5k/ + +obj-$(CONFIG_MAC80211_HWSIM) += mac80211_hwsim.o diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c new file mode 100644 index 000000000000..3e33b29a4fc9 --- /dev/null +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -0,0 +1,533 @@ +/* + * mac80211_hwsim - software simulator of 802.11 radio(s) for mac80211 + * Copyright (c) 2008, Jouni Malinen + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +/* + * TODO: + * - IBSS mode simulation (Beacon transmission with competition for "air time") + * - IEEE 802.11a and 802.11n modes + * - RX filtering based on filter configuration (data->rx_filter) + */ + +#include +#include +#include +#include +#include + +MODULE_AUTHOR("Jouni Malinen"); +MODULE_DESCRIPTION("Software simulator of 802.11 radio(s) for mac80211"); +MODULE_LICENSE("GPL"); + +static int radios = 2; +module_param(radios, int, 0444); +MODULE_PARM_DESC(radios, "Number of simulated radios"); + + +static struct class *hwsim_class; + +static struct ieee80211_hw **hwsim_radios; +static int hwsim_radio_count; +static struct net_device *hwsim_mon; /* global monitor netdev */ + + +static const struct ieee80211_channel hwsim_channels[] = { + { .center_freq = 2412 }, + { .center_freq = 2417 }, + { .center_freq = 2422 }, + { .center_freq = 2427 }, + { .center_freq = 2432 }, + { .center_freq = 2437 }, + { .center_freq = 2442 }, + { .center_freq = 2447 }, + { .center_freq = 2452 }, + { .center_freq = 2457 }, + { .center_freq = 2462 }, + { .center_freq = 2467 }, + { .center_freq = 2472 }, + { .center_freq = 2484 }, +}; + +static const struct ieee80211_rate hwsim_rates[] = { + { .bitrate = 10 }, + { .bitrate = 20, .flags = IEEE80211_RATE_SHORT_PREAMBLE }, + { .bitrate = 55, .flags = IEEE80211_RATE_SHORT_PREAMBLE }, + { .bitrate = 110, .flags = IEEE80211_RATE_SHORT_PREAMBLE }, + { .bitrate = 60 }, + { .bitrate = 90 }, + { .bitrate = 120 }, + { .bitrate = 180 }, + { .bitrate = 240 }, + { .bitrate = 360 }, + { .bitrate = 480 }, + { .bitrate = 540 } +}; + +struct mac80211_hwsim_data { + struct device *dev; + struct ieee80211_supported_band band; + struct ieee80211_channel channels[ARRAY_SIZE(hwsim_channels)]; + struct ieee80211_rate rates[ARRAY_SIZE(hwsim_rates)]; + + struct ieee80211_channel *channel; + int radio_enabled; + unsigned long beacon_int; /* in jiffies unit */ + unsigned int rx_filter; + int started; + struct timer_list beacon_timer; +}; + + +struct hwsim_radiotap_hdr { + struct ieee80211_radiotap_header hdr; + u8 rt_flags; + u8 rt_rate; + __le16 rt_channel; + __le16 rt_chbitmask; +} __attribute__ ((packed)); + + +static int hwsim_mon_xmit(struct sk_buff *skb, struct net_device *dev) +{ + /* TODO: allow packet injection */ + dev_kfree_skb(skb); + return 0; +} + + +static void mac80211_hwsim_monitor_rx(struct ieee80211_hw *hw, + struct sk_buff *tx_skb) +{ + struct mac80211_hwsim_data *data = hw->priv; + struct sk_buff *skb; + struct hwsim_radiotap_hdr *hdr; + u16 flags; + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx_skb); + struct ieee80211_rate *txrate = ieee80211_get_tx_rate(hw, info); + + if (!netif_running(hwsim_mon)) + return; + + skb = skb_copy_expand(tx_skb, sizeof(*hdr), 0, GFP_ATOMIC); + if (skb == NULL) + return; + + hdr = (struct hwsim_radiotap_hdr *) skb_push(skb, sizeof(*hdr)); + hdr->hdr.it_version = PKTHDR_RADIOTAP_VERSION; + hdr->hdr.it_pad = 0; + hdr->hdr.it_len = cpu_to_le16(sizeof(*hdr)); + hdr->hdr.it_present = __constant_cpu_to_le32( + (1 << IEEE80211_RADIOTAP_FLAGS) | + (1 << IEEE80211_RADIOTAP_RATE) | + (1 << IEEE80211_RADIOTAP_CHANNEL)); + hdr->rt_flags = 0; + hdr->rt_rate = txrate->bitrate / 5; + hdr->rt_channel = data->channel->center_freq; + flags = IEEE80211_CHAN_2GHZ; + if (txrate->flags & IEEE80211_RATE_ERP_G) + flags |= IEEE80211_CHAN_OFDM; + else + flags |= IEEE80211_CHAN_CCK; + hdr->rt_chbitmask = cpu_to_le16(flags); + + skb->dev = hwsim_mon; + skb_set_mac_header(skb, 0); + skb->ip_summed = CHECKSUM_UNNECESSARY; + skb->pkt_type = PACKET_OTHERHOST; + skb->protocol = __constant_htons(ETH_P_802_2); + memset(skb->cb, 0, sizeof(skb->cb)); + netif_rx(skb); +} + + +static int mac80211_hwsim_tx(struct ieee80211_hw *hw, struct sk_buff *skb) +{ + struct mac80211_hwsim_data *data = hw->priv; + struct ieee80211_rx_status rx_status; + int i, ack = 0; + struct ieee80211_hdr *hdr; + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); + struct ieee80211_tx_info *txi; + + mac80211_hwsim_monitor_rx(hw, skb); + + if (skb->len < 10) { + /* Should not happen; just a sanity check for addr1 use */ + dev_kfree_skb(skb); + return NETDEV_TX_OK; + } + + if (!data->radio_enabled) { + printk(KERN_DEBUG "%s: dropped TX frame since radio " + "disabled\n", wiphy_name(hw->wiphy)); + dev_kfree_skb(skb); + return NETDEV_TX_OK; + } + + hdr = (struct ieee80211_hdr *) skb->data; + + memset(&rx_status, 0, sizeof(rx_status)); + /* TODO: set mactime */ + rx_status.freq = data->channel->center_freq; + rx_status.band = data->channel->band; + rx_status.rate_idx = info->tx_rate_idx; + /* TODO: simulate signal strength (and optional packet drop) */ + + /* Copy skb to all enabled radios that are on the current frequency */ + for (i = 0; i < hwsim_radio_count; i++) { + struct mac80211_hwsim_data *data2; + struct sk_buff *nskb; + + if (hwsim_radios[i] == NULL || hwsim_radios[i] == hw) + continue; + data2 = hwsim_radios[i]->priv; + if (!data2->started || !data2->radio_enabled || + data->channel->center_freq != data2->channel->center_freq) + continue; + + nskb = skb_copy(skb, GFP_ATOMIC); + if (nskb == NULL) + continue; + + if (memcmp(hdr->addr1, hwsim_radios[i]->wiphy->perm_addr, + ETH_ALEN) == 0) + ack = 1; + ieee80211_rx_irqsafe(hwsim_radios[i], nskb, &rx_status); + } + + txi = IEEE80211_SKB_CB(skb); + memset(&txi->status, 0, sizeof(txi->status)); + if (!(txi->flags & IEEE80211_TX_CTL_NO_ACK)) { + if (ack) + txi->flags |= IEEE80211_TX_STAT_ACK; + else + txi->status.excessive_retries = 1; + } + ieee80211_tx_status_irqsafe(hw, skb); + return NETDEV_TX_OK; +} + + +static int mac80211_hwsim_start(struct ieee80211_hw *hw) +{ + struct mac80211_hwsim_data *data = hw->priv; + printk(KERN_DEBUG "%s:%s\n", wiphy_name(hw->wiphy), __func__); + data->started = 1; + return 0; +} + + +static void mac80211_hwsim_stop(struct ieee80211_hw *hw) +{ + struct mac80211_hwsim_data *data = hw->priv; + data->started = 0; + printk(KERN_DEBUG "%s:%s\n", wiphy_name(hw->wiphy), __func__); +} + + +static int mac80211_hwsim_add_interface(struct ieee80211_hw *hw, + struct ieee80211_if_init_conf *conf) +{ + DECLARE_MAC_BUF(mac); + printk(KERN_DEBUG "%s:%s (type=%d mac_addr=%s)\n", + wiphy_name(hw->wiphy), __func__, conf->type, + print_mac(mac, conf->mac_addr)); + return 0; +} + + +static void mac80211_hwsim_remove_interface( + struct ieee80211_hw *hw, struct ieee80211_if_init_conf *conf) +{ + DECLARE_MAC_BUF(mac); + printk(KERN_DEBUG "%s:%s (type=%d mac_addr=%s)\n", + wiphy_name(hw->wiphy), __func__, conf->type, + print_mac(mac, conf->mac_addr)); +} + + +static void mac80211_hwsim_beacon_tx(void *arg, u8 *mac, + struct ieee80211_vif *vif) +{ + struct ieee80211_hw *hw = arg; + struct mac80211_hwsim_data *data = hw->priv; + struct sk_buff *skb; + struct ieee80211_rx_status rx_status; + int i; + struct ieee80211_tx_info *info; + + if (vif->type != IEEE80211_IF_TYPE_AP) + return; + + skb = ieee80211_beacon_get(hw, vif); + if (skb == NULL) + return; + info = IEEE80211_SKB_CB(skb); + + mac80211_hwsim_monitor_rx(hw, skb); + + memset(&rx_status, 0, sizeof(rx_status)); + /* TODO: set mactime */ + rx_status.freq = data->channel->center_freq; + rx_status.band = data->channel->band; + rx_status.rate_idx = info->tx_rate_idx; + /* TODO: simulate signal strength (and optional packet drop) */ + + /* Copy skb to all enabled radios that are on the current frequency */ + for (i = 0; i < hwsim_radio_count; i++) { + struct mac80211_hwsim_data *data2; + struct sk_buff *nskb; + + if (hwsim_radios[i] == NULL || hwsim_radios[i] == hw) + continue; + data2 = hwsim_radios[i]->priv; + if (!data2->started || !data2->radio_enabled || + data->channel->center_freq != data2->channel->center_freq) + continue; + + nskb = skb_copy(skb, GFP_ATOMIC); + if (nskb == NULL) + continue; + + ieee80211_rx_irqsafe(hwsim_radios[i], nskb, &rx_status); + } + + dev_kfree_skb(skb); +} + + +static void mac80211_hwsim_beacon(unsigned long arg) +{ + struct ieee80211_hw *hw = (struct ieee80211_hw *) arg; + struct mac80211_hwsim_data *data = hw->priv; + + if (!data->started || !data->radio_enabled) + return; + + ieee80211_iterate_active_interfaces(hw, mac80211_hwsim_beacon_tx, hw); + + data->beacon_timer.expires = jiffies + data->beacon_int; + add_timer(&data->beacon_timer); +} + + +static int mac80211_hwsim_config(struct ieee80211_hw *hw, + struct ieee80211_conf *conf) +{ + struct mac80211_hwsim_data *data = hw->priv; + + printk(KERN_DEBUG "%s:%s (freq=%d radio_enabled=%d beacon_int=%d)\n", + wiphy_name(hw->wiphy), __func__, + conf->channel->center_freq, conf->radio_enabled, + conf->beacon_int); + + data->channel = conf->channel; + data->radio_enabled = conf->radio_enabled; + data->beacon_int = 1024 * conf->beacon_int / 1000 * HZ / 1000; + if (data->beacon_int < 1) + data->beacon_int = 1; + + if (!data->started || !data->radio_enabled) + del_timer(&data->beacon_timer); + else + mod_timer(&data->beacon_timer, jiffies + data->beacon_int); + + return 0; +} + + +static void mac80211_hwsim_configure_filter(struct ieee80211_hw *hw, + unsigned int changed_flags, + unsigned int *total_flags, + int mc_count, + struct dev_addr_list *mc_list) +{ + struct mac80211_hwsim_data *data = hw->priv; + + printk(KERN_DEBUG "%s:%s\n", wiphy_name(hw->wiphy), __func__); + + data->rx_filter = 0; + if (*total_flags & FIF_PROMISC_IN_BSS) + data->rx_filter |= FIF_PROMISC_IN_BSS; + if (*total_flags & FIF_ALLMULTI) + data->rx_filter |= FIF_ALLMULTI; + + *total_flags = data->rx_filter; +} + + + +static const struct ieee80211_ops mac80211_hwsim_ops = +{ + .tx = mac80211_hwsim_tx, + .start = mac80211_hwsim_start, + .stop = mac80211_hwsim_stop, + .add_interface = mac80211_hwsim_add_interface, + .remove_interface = mac80211_hwsim_remove_interface, + .config = mac80211_hwsim_config, + .configure_filter = mac80211_hwsim_configure_filter, +}; + + +static void mac80211_hwsim_free(void) +{ + int i; + + for (i = 0; i < hwsim_radio_count; i++) { + if (hwsim_radios[i]) { + struct mac80211_hwsim_data *data; + data = hwsim_radios[i]->priv; + ieee80211_unregister_hw(hwsim_radios[i]); + if (!IS_ERR(data->dev)) + device_unregister(data->dev); + ieee80211_free_hw(hwsim_radios[i]); + } + } + kfree(hwsim_radios); + class_destroy(hwsim_class); +} + + +static struct device_driver mac80211_hwsim_driver = { + .name = "mac80211_hwsim" +}; + + +static void hwsim_mon_setup(struct net_device *dev) +{ + dev->hard_start_xmit = hwsim_mon_xmit; + dev->destructor = free_netdev; + ether_setup(dev); + dev->tx_queue_len = 0; + dev->type = ARPHRD_IEEE80211_RADIOTAP; + memset(dev->dev_addr, 0, ETH_ALEN); + dev->dev_addr[0] = 0x12; +} + + +static int __init init_mac80211_hwsim(void) +{ + int i, err = 0; + u8 addr[ETH_ALEN]; + struct mac80211_hwsim_data *data; + struct ieee80211_hw *hw; + DECLARE_MAC_BUF(mac); + + if (radios < 1 || radios > 65535) + return -EINVAL; + + hwsim_radio_count = radios; + hwsim_radios = kcalloc(hwsim_radio_count, + sizeof(struct ieee80211_hw *), GFP_KERNEL); + if (hwsim_radios == NULL) + return -ENOMEM; + + hwsim_class = class_create(THIS_MODULE, "mac80211_hwsim"); + if (IS_ERR(hwsim_class)) { + kfree(hwsim_radios); + return PTR_ERR(hwsim_class); + } + + memset(addr, 0, ETH_ALEN); + addr[0] = 0x02; + + for (i = 0; i < hwsim_radio_count; i++) { + printk(KERN_DEBUG "mac80211_hwsim: Initializing radio %d\n", + i); + hw = ieee80211_alloc_hw(sizeof(*data), &mac80211_hwsim_ops); + if (hw == NULL) { + printk(KERN_DEBUG "mac80211_hwsim: ieee80211_alloc_hw " + "failed\n"); + err = -ENOMEM; + goto failed; + } + hwsim_radios[i] = hw; + + data = hw->priv; + data->dev = device_create(hwsim_class, NULL, 0, "hwsim%d", i); + if (IS_ERR(data->dev)) { + printk(KERN_DEBUG "mac80211_hwsim: device_create " + "failed (%ld)\n", PTR_ERR(data->dev)); + err = -ENOMEM; + goto failed; + } + data->dev->driver = &mac80211_hwsim_driver; + dev_set_drvdata(data->dev, hw); + + SET_IEEE80211_DEV(hw, data->dev); + addr[3] = i >> 8; + addr[4] = i; + SET_IEEE80211_PERM_ADDR(hw, addr); + + hw->channel_change_time = 1; + hw->queues = 1; + + memcpy(data->channels, hwsim_channels, sizeof(hwsim_channels)); + memcpy(data->rates, hwsim_rates, sizeof(hwsim_rates)); + data->band.channels = data->channels; + data->band.n_channels = ARRAY_SIZE(hwsim_channels); + data->band.bitrates = data->rates; + data->band.n_bitrates = ARRAY_SIZE(hwsim_rates); + hw->wiphy->bands[IEEE80211_BAND_2GHZ] = &data->band; + + err = ieee80211_register_hw(hw); + if (err < 0) { + printk(KERN_DEBUG "mac80211_hwsim: " + "ieee80211_register_hw failed (%d)\n", err); + goto failed; + } + + printk(KERN_DEBUG "%s: hwaddr %s registered\n", + wiphy_name(hw->wiphy), + print_mac(mac, hw->wiphy->perm_addr)); + + setup_timer(&data->beacon_timer, mac80211_hwsim_beacon, + (unsigned long) hw); + } + + hwsim_mon = alloc_netdev(0, "hwsim%d", hwsim_mon_setup); + if (hwsim_mon == NULL) + goto failed; + + rtnl_lock(); + + err = dev_alloc_name(hwsim_mon, hwsim_mon->name); + if (err < 0) { + goto failed_mon; + } + + err = register_netdevice(hwsim_mon); + if (err < 0) + goto failed_mon; + + rtnl_unlock(); + + return 0; + +failed_mon: + rtnl_unlock(); + free_netdev(hwsim_mon); + +failed: + mac80211_hwsim_free(); + return err; +} + + +static void __exit exit_mac80211_hwsim(void) +{ + printk(KERN_DEBUG "mac80211_hwsim: unregister %d radios\n", + hwsim_radio_count); + + unregister_netdev(hwsim_mon); + mac80211_hwsim_free(); +} + + +module_init(init_mac80211_hwsim); +module_exit(exit_mac80211_hwsim); -- cgit v1.2.3-58-ga151 From ba77f1abde3999e45d92c0ba4e0356f7498e959f Mon Sep 17 00:00:00 2001 From: Jouni Malinen Date: Fri, 13 Jun 2008 19:44:46 +0300 Subject: mac80211_hwsim: Clean up documentation Clean up the introduction and fix a typo. Signed-off-by: Jouni Malinen Signed-off-by: John W. Linville --- Documentation/networking/mac80211_hwsim/README | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/mac80211_hwsim/README b/Documentation/networking/mac80211_hwsim/README index 2f6e90fcb5a7..2ff8ccb8dc37 100644 --- a/Documentation/networking/mac80211_hwsim/README +++ b/Documentation/networking/mac80211_hwsim/README @@ -9,13 +9,12 @@ published by the Free Software Foundation. Introduction mac80211_hwsim is a Linux kernel module that can be used to simulate -arbitrary number of IEEE 802.11 radios for mac80211 on a single -device. It can be used to test most of the mac80211 functionality and -user space tools (e.g., hostapd and wpa_supplicant) in a way that -matches very closely with the normal case of using real WLAN -hardware. From the mac80211 view point, mac80211_hwsim is yet another -hardware driver, i.e., no changes to mac80211 are needed to use this -testing tool. +arbitrary number of IEEE 802.11 radios for mac80211. It can be used to +test most of the mac80211 functionality and user space tools (e.g., +hostapd and wpa_supplicant) in a way that matches very closely with +the normal case of using real WLAN hardware. From the mac80211 view +point, mac80211_hwsim is yet another hardware driver, i.e., no changes +to mac80211 are needed to use this testing tool. The main goal for mac80211_hwsim is to make it easier for developers to test their code and work with new features to mac80211, hostapd, @@ -26,7 +25,7 @@ since all radio operation is simulated, any channel can be used in tests regardless of regulatory rules. mac80211_hwsim kernel module has a parameter 'radios' that can be used -to select how many radios are simulates (default 2). This allows +to select how many radios are simulated (default 2). This allows configuration of both very simply setups (e.g., just a single access point and a station) or large scale tests (multiple access points with hundreds of stations). -- cgit v1.2.3-58-ga151 From b59f9f74c4c0a569398f08c34a877f1b7b457496 Mon Sep 17 00:00:00 2001 From: Jay Vosburgh Date: Fri, 13 Jun 2008 18:12:03 -0700 Subject: bonding: Rework / fix multiple gratuitous ARP support Support for sending multiple gratuitous ARPs during failovers was added by commit: commit 7893b2491a2d5f716540ac5643d78d37a7f6628b Author: Moni Shoua Date: Sat May 17 21:10:12 2008 -0700 bonding: Send more than one gratuitous ARP when slave takes over This change modifies that support to remove duplicated code, add support for ARP monitor (the original only supported miimon), clear the grat ARP counter in bond_close (lest a later "ifconfig up" immediately start spewing ARPs), and add documentation for the module parameter. Also updated driver version to 3.3.0. Signed-off-by: Jay Vosburgh Signed-off-by: Jeff Garzik --- Documentation/networking/bonding.txt | 11 ++++++++++ drivers/net/bonding/bond_main.c | 42 +++++++++++++++++++----------------- drivers/net/bonding/bonding.h | 4 ++-- 3 files changed, 35 insertions(+), 22 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 8e6b8d3c7410..370b7da73ab4 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -539,6 +539,17 @@ mode swapped with the new curr_active_slave that was chosen. +num_grat_arp + + Specifies the number of gratuitous ARPs to be issued after a + failover event. One gratuitous ARP is issued immediately after + the failover, subsequent ARPs are sent at a rate of one per link + monitor interval (arp_interval or miimon, whichever is active). + + The valid range is 0 - 255; the default value is 1. This option + affects only the active-backup mode. This option was added for + bonding version 3.3.0. + primary A string (eth0, eth2, etc) specifying which slave is the diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 925402bcdf4d..3b6d66a8ab98 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1195,14 +1195,7 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active) old_active); bond->send_grat_arp = bond->params.num_grat_arp; - if (!test_bit(__LINK_STATE_LINKWATCH_PENDING, - &bond->curr_active_slave->dev->state)) { - bond_send_gratuitous_arp(bond); - bond->send_grat_arp--; - } else { - dprintk("delaying gratuitous arp on %s\n", - bond->curr_active_slave->dev->name); - } + bond_send_gratuitous_arp(bond); write_unlock_bh(&bond->curr_slave_lock); read_unlock(&bond->lock); @@ -2241,17 +2234,6 @@ static int __bond_mii_monitor(struct bonding *bond, int have_locks) * program could monitor the link itself if needed. */ - if (bond->send_grat_arp) { - if (bond->curr_active_slave && test_bit(__LINK_STATE_LINKWATCH_PENDING, - &bond->curr_active_slave->dev->state)) - dprintk("Needs to send gratuitous arp but not yet\n"); - else { - dprintk("sending delayed gratuitous arp on on %s\n", - bond->curr_active_slave->dev->name); - bond_send_gratuitous_arp(bond); - bond->send_grat_arp--; - } - } read_lock(&bond->curr_slave_lock); oldcurrent = bond->curr_active_slave; read_unlock(&bond->curr_slave_lock); @@ -2492,6 +2474,13 @@ void bond_mii_monitor(struct work_struct *work) read_unlock(&bond->lock); return; } + + if (bond->send_grat_arp) { + read_lock(&bond->curr_slave_lock); + bond_send_gratuitous_arp(bond); + read_unlock(&bond->curr_slave_lock); + } + if (__bond_mii_monitor(bond, 0)) { read_unlock(&bond->lock); rtnl_lock(); @@ -2657,6 +2646,8 @@ static void bond_arp_send_all(struct bonding *bond, struct slave *slave) /* * Kick out a gratuitous ARP for an IP on the bonding master plus one * for each VLAN above us. + * + * Caller must hold curr_slave_lock for read or better */ static void bond_send_gratuitous_arp(struct bonding *bond) { @@ -2666,9 +2657,13 @@ static void bond_send_gratuitous_arp(struct bonding *bond) dprintk("bond_send_grat_arp: bond %s slave %s\n", bond->dev->name, slave ? slave->dev->name : "NULL"); - if (!slave) + + if (!slave || !bond->send_grat_arp || + test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state)) return; + bond->send_grat_arp--; + if (bond->master_ip) { bond_arp_send(slave->dev, ARPOP_REPLY, bond->master_ip, bond->master_ip, 0); @@ -3172,6 +3167,12 @@ void bond_activebackup_arp_mon(struct work_struct *work) if (bond->slave_cnt == 0) goto re_arm; + if (bond->send_grat_arp) { + read_lock(&bond->curr_slave_lock); + bond_send_gratuitous_arp(bond); + read_unlock(&bond->curr_slave_lock); + } + if (bond_ab_arp_inspect(bond, delta_in_ticks)) { read_unlock(&bond->lock); rtnl_lock(); @@ -3846,6 +3847,7 @@ static int bond_close(struct net_device *bond_dev) write_lock_bh(&bond->lock); + bond->send_grat_arp = 0; /* signal timers not to re-arm */ bond->kill_timers = 1; diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index 89fd9963db7a..fb730ec0396f 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -22,8 +22,8 @@ #include "bond_3ad.h" #include "bond_alb.h" -#define DRV_VERSION "3.2.5" -#define DRV_RELDATE "March 21, 2008" +#define DRV_VERSION "3.3.0" +#define DRV_RELDATE "June 10, 2008" #define DRV_NAME "bonding" #define DRV_DESCRIPTION "Ethernet Channel Bonding Driver" -- cgit v1.2.3-58-ga151 From b8a9787eddb0e4665f31dd1d64584732b2b5d051 Mon Sep 17 00:00:00 2001 From: Jay Vosburgh Date: Fri, 13 Jun 2008 18:12:04 -0700 Subject: bonding: Allow setting max_bonds to zero Permit bonding to function rationally if max_bonds is set to zero. This will load the module, but create no master devices (which can be created via sysfs). Requires some change to bond_create_sysfs; currently, the netdev sysfs directory is determined from the first bonding device created, but this is no longer possible. Instead, an interface from net/core is created to create and destroy files in net_class. Based on a patch submitted by Phil Oester . Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to update the documentation. Signed-off-by: Phil Oester Signed-off-by: Jay Vosburgh Signed-off-by: Jeff Garzik --- Documentation/networking/bonding.txt | 3 ++- drivers/net/bonding/bond_main.c | 6 +++--- drivers/net/bonding/bond_sysfs.c | 22 +++------------------- include/linux/netdevice.h | 3 +++ net/core/net-sysfs.c | 13 +++++++++++++ 5 files changed, 24 insertions(+), 23 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index 370b7da73ab4..7fa7fe71d7a8 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt @@ -376,7 +376,8 @@ max_bonds Specifies the number of bonding devices to create for this instance of the bonding driver. E.g., if max_bonds is 3, and the bonding driver is not already loaded, then bond0, bond1 - and bond2 will be created. The default value is 1. + and bond2 will be created. The default value is 1. Specifying + a value of 0 will load bonding, but will not create any devices. miimon diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 3b6d66a8ab98..d57b65dc2c72 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4750,11 +4750,11 @@ static int bond_check_params(struct bond_params *params) } } - if (max_bonds < 1 || max_bonds > INT_MAX) { + if (max_bonds < 0 || max_bonds > INT_MAX) { printk(KERN_WARNING DRV_NAME ": Warning: max_bonds (%d) not in range %d-%d, so it " "was reset to BOND_DEFAULT_MAX_BONDS (%d)\n", - max_bonds, 1, INT_MAX, BOND_DEFAULT_MAX_BONDS); + max_bonds, 0, INT_MAX, BOND_DEFAULT_MAX_BONDS); max_bonds = BOND_DEFAULT_MAX_BONDS; } @@ -4953,7 +4953,7 @@ static int bond_check_params(struct bond_params *params) printk("\n"); - } else { + } else if (max_bonds) { /* miimon and arp_interval not set, we need one so things * work as expected, see bonding.txt for details */ diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index dd265c69b0df..6caac0ffb2f2 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -53,7 +53,6 @@ extern struct bond_parm_tbl arp_validate_tbl[]; extern struct bond_parm_tbl fail_over_mac_tbl[]; static int expected_refcount = -1; -static struct class *netdev_class; /*--------------------------- Data Structures -----------------------------*/ /* Bonding sysfs lock. Why can't we just use the subsystem lock? @@ -1447,19 +1446,9 @@ static struct attribute_group bonding_group = { */ int bond_create_sysfs(void) { - int ret = 0; - struct bonding *firstbond; - - /* get the netdev class pointer */ - firstbond = container_of(bond_dev_list.next, struct bonding, bond_list); - if (!firstbond) - return -ENODEV; - - netdev_class = firstbond->dev->dev.class; - if (!netdev_class) - return -ENODEV; + int ret; - ret = class_create_file(netdev_class, &class_attr_bonding_masters); + ret = netdev_class_create_file(&class_attr_bonding_masters); /* * Permit multiple loads of the module by ignoring failures to * create the bonding_masters sysfs file. Bonding devices @@ -1478,10 +1467,6 @@ int bond_create_sysfs(void) printk(KERN_ERR "network device named %s already exists in sysfs", class_attr_bonding_masters.attr.name); - else { - netdev_class = NULL; - return 0; - } } return ret; @@ -1493,8 +1478,7 @@ int bond_create_sysfs(void) */ void bond_destroy_sysfs(void) { - if (netdev_class) - class_remove_file(netdev_class, &class_attr_bonding_masters); + netdev_class_remove_file(&class_attr_bonding_masters); } /* diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e92fc839ab1d..9ccbfac3fd95 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1506,6 +1506,9 @@ extern void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos); extern void dev_seq_stop(struct seq_file *seq, void *v); #endif +extern int netdev_class_create_file(struct class_attribute *class_attr); +extern void netdev_class_remove_file(struct class_attribute *class_attr); + extern void linkwatch_run_queue(void); extern int netdev_compute_features(unsigned long all, unsigned long one); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index dccd737ea2e3..3f7941319217 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -468,6 +468,19 @@ int netdev_register_kobject(struct net_device *net) return device_add(dev); } +int netdev_class_create_file(struct class_attribute *class_attr) +{ + return class_create_file(&net_class, class_attr); +} + +void netdev_class_remove_file(struct class_attribute *class_attr) +{ + class_remove_file(&net_class, class_attr); +} + +EXPORT_SYMBOL(netdev_class_create_file); +EXPORT_SYMBOL(netdev_class_remove_file); + void netdev_initialize_kobject(struct net_device *net) { struct device *device = &(net->dev); -- cgit v1.2.3-58-ga151 From 2bdf06c047d0bf7baa629b9074086e5338bd2b60 Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Tue, 24 Jun 2008 22:16:08 +0100 Subject: DM9000: Add documentation for the driver. Add Documentation/networking/dm9000.txt for the DM9000 network driver. Signed-off-by: Ben Dooks Signed-off-by: Jeff Garzik --- Documentation/networking/dm9000.txt | 167 ++++++++++++++++++++++++++++++++++++ 1 file changed, 167 insertions(+) create mode 100644 Documentation/networking/dm9000.txt (limited to 'Documentation') diff --git a/Documentation/networking/dm9000.txt b/Documentation/networking/dm9000.txt new file mode 100644 index 000000000000..65df3dea5561 --- /dev/null +++ b/Documentation/networking/dm9000.txt @@ -0,0 +1,167 @@ +DM9000 Network driver +===================== + +Copyright 2008 Simtec Electronics, + Ben Dooks + + +Introduction +------------ + +This file describes how to use the DM9000 platform-device based network driver +that is contained in the files drivers/net/dm9000.c and drivers/net/dm9000.h. + +The driver supports three DM9000 variants, the DM9000E which is the first chip +supported as well as the newer DM9000A and DM9000B devices. It is currently +maintained and tested by Ben Dooks, who should be CC: to any patches for this +driver. + + +Defining the platform device +---------------------------- + +The minimum set of resources attached to the platform device are as follows: + + 1) The physical address of the address register + 2) The physical address of the data register + 3) The IRQ line the device's interrupt pin is connected to. + +These resources should be specified in that order, as the ordering of the +two address regions is important (the driver expects these to be address +and then data). + +An example from arch/arm/mach-s3c2410/mach-bast.c is: + +static struct resource bast_dm9k_resource[] = { + [0] = { + .start = S3C2410_CS5 + BAST_PA_DM9000, + .end = S3C2410_CS5 + BAST_PA_DM9000 + 3, + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = S3C2410_CS5 + BAST_PA_DM9000 + 0x40, + .end = S3C2410_CS5 + BAST_PA_DM9000 + 0x40 + 0x3f, + .flags = IORESOURCE_MEM, + }, + [2] = { + .start = IRQ_DM9000, + .end = IRQ_DM9000, + .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL, + } +}; + +static struct platform_device bast_device_dm9k = { + .name = "dm9000", + .id = 0, + .num_resources = ARRAY_SIZE(bast_dm9k_resource), + .resource = bast_dm9k_resource, +}; + +Note the setting of the IRQ trigger flag in bast_dm9k_resource[2].flags, +as this will generate a warning if it is not present. The trigger from +the flags field will be passed to request_irq() when registering the IRQ +handler to ensure that the IRQ is setup correctly. + +This shows a typical platform device, without the optional configuration +platform data supplied. The next example uses the same resources, but adds +the optional platform data to pass extra configuration data: + +static struct dm9000_plat_data bast_dm9k_platdata = { + .flags = DM9000_PLATF_16BITONLY, +}; + +static struct platform_device bast_device_dm9k = { + .name = "dm9000", + .id = 0, + .num_resources = ARRAY_SIZE(bast_dm9k_resource), + .resource = bast_dm9k_resource, + .dev = { + .platform_data = &bast_dm9k_platdata, + } +}; + +The platform data is defined in include/linux/dm9000.h and described below. + + +Platform data +------------- + +Extra platform data for the DM9000 can describe the IO bus width to the +device, whether or not an external PHY is attached to the device and +the availability of an external configuration EEPROM. + +The flags for the platform data .flags field are as follows: + +DM9000_PLATF_8BITONLY + + The IO should be done with 8bit operations. + +DM9000_PLATF_16BITONLY + + The IO should be done with 16bit operations. + +DM9000_PLATF_32BITONLY + + The IO should be done with 32bit operations. + +DM9000_PLATF_EXT_PHY + + The chip is connected to an external PHY. + +DM9000_PLATF_NO_EEPROM + + This can be used to signify that the board does not have an + EEPROM, or that the EEPROM should be hidden from the user. + +DM9000_PLATF_SIMPLE_PHY + + Switch to using the simpler PHY polling method which does not + try and read the MII PHY state regularly. This is only available + when using the internal PHY. See the section on link state polling + for more information. + + The config symbol DM9000_FORCE_SIMPLE_PHY_POLL, Kconfig entry + "Force simple NSR based PHY polling" allows this flag to be + forced on at build time. + + +PHY Link state polling +---------------------- + +The driver keeps track of the link state and informs the network core +about link (carrier) availablilty. This is managed by several methods +depending on the version of the chip and on which PHY is being used. + +For the internal PHY, the original (and currently default) method is +to read the MII state, either when the status changes if we have the +necessary interrupt support in the chip or every two seconds via a +periodic timer. + +To reduce the overhead for the internal PHY, there is now the option +of using the DM9000_FORCE_SIMPLE_PHY_POLL config, or DM9000_PLATF_SIMPLE_PHY +platform data option to read the summary information without the +expensive MII accesses. This method is faster, but does not print +as much information. + +When using an external PHY, the driver currently has to poll the MII +link status as there is no method for getting an interrupt on link change. + + +DM9000A / DM9000B +----------------- + +These chips are functionally similar to the DM9000E and are supported easily +by the same driver. The features are: + + 1) Interrupt on internal PHY state change. This means that the periodic + polling of the PHY status may be disabled on these devices when using + the internal PHY. + + 2) TCP/UDP checksum offloading, which the driver does not currently support. + + +ethtool +------- + +The driver supports the ethtool interface for access to the driver +state information, the PHY state and the EEPROM. -- cgit v1.2.3-58-ga151 From f3146aff7f283c8699e0c97df6307a705786eeba Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Mon, 23 Jun 2008 17:22:56 -0300 Subject: rfkill: clarify meaning of rfkill states rfkill really should have been named rfswitch. As it is, one can get confused whether RFKILL_STATE_ON means the KILL switch is on (and therefore, the radio is being *blocked* from operating), or whether it means the RADIO rf output is on. Clearly state that RFKILL_STATE_ON means the radio is *unblocked* from operating (i.e. there is no rf killing going on). Signed-off-by: Henrique de Moraes Holschuh Acked-by: Ivo van Doorn Cc: Dmitry Torokhov Signed-off-by: John W. Linville --- Documentation/rfkill.txt | 7 +++++++ include/linux/rfkill.h | 6 +++--- 2 files changed, 10 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt index a83ff23cd68c..ec75d6d34785 100644 --- a/Documentation/rfkill.txt +++ b/Documentation/rfkill.txt @@ -8,6 +8,13 @@ rfkill - RF switch subsystem support =============================================================================== 1: Implementation details +The rfkill switch subsystem exists to add a generic interface to circuitry that +can enable or disable the RF output of a radio *transmitter* of any type. + +When a rfkill switch is in the RFKILL_STATE_ON, the radio transmitter is +*enabled*. When the rfkill switch is in the RFKILL_STATE_OFF, the radio +transmitter is *disabled*. + The rfkill switch subsystem offers support for keys often found on laptops to enable wireless devices like WiFi and Bluetooth. diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h index e3ab21d7fc7f..ca89ae1b0219 100644 --- a/include/linux/rfkill.h +++ b/include/linux/rfkill.h @@ -44,8 +44,8 @@ enum rfkill_type { }; enum rfkill_state { - RFKILL_STATE_OFF = 0, - RFKILL_STATE_ON = 1, + RFKILL_STATE_OFF = 0, /* Radio output blocked */ + RFKILL_STATE_ON = 1, /* Radio output active */ }; /** @@ -53,7 +53,7 @@ enum rfkill_state { * @name: Name of the switch. * @type: Radio type which the button controls, the value stored * here should be a value from enum rfkill_type. - * @state: State of the switch (on/off). + * @state: State of the switch, "ON" means radio can operate. * @user_claim_unsupported: Whether the hardware supports exclusive * RF-kill control by userspace. Set this before registering. * @user_claim: Set when the switch is controlled exlusively by userspace. -- cgit v1.2.3-58-ga151 From dc288520a21879c6540f3249e9532c5e032da4e8 Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Mon, 23 Jun 2008 17:23:08 -0300 Subject: rfkill: document rw rfkill switches and clarify input subsystem interactions Rework the documentation so as to make sure driver writers understand exactly where the boundaries are for input drivers related to rfkill switches, buttons and keys, and rfkill class drivers. Also fix a small error in the documentation: setting the state of a normal instance of the rfkill class does not affect the state of any other devices (unless they are tied by firmware/hardware somehow). Signed-off-by: Henrique de Moraes Holschuh Acked-by: Ivo van Doorn Cc: Dmitry Torokhov Signed-off-by: John W. Linville --- Documentation/rfkill.txt | 363 ++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 308 insertions(+), 55 deletions(-) (limited to 'Documentation') diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt index ec75d6d34785..cf230c1ad9ef 100644 --- a/Documentation/rfkill.txt +++ b/Documentation/rfkill.txt @@ -1,83 +1,328 @@ rfkill - RF switch subsystem support ==================================== -1 Implementation details -2 Driver support -3 Userspace support +1 Introduction +2 Implementation details +3 Kernel driver guidelines +4 Kernel API +5 Userspace support -=============================================================================== -1: Implementation details + +1. Introduction: The rfkill switch subsystem exists to add a generic interface to circuitry that -can enable or disable the RF output of a radio *transmitter* of any type. +can enable or disable the signal output of a wireless *transmitter* of any +type. By far, the most common use is to disable radio-frequency transmitters. -When a rfkill switch is in the RFKILL_STATE_ON, the radio transmitter is -*enabled*. When the rfkill switch is in the RFKILL_STATE_OFF, the radio -transmitter is *disabled*. +The rfkill switch subsystem offers support for keys and switches often found on +laptops to enable wireless devices like WiFi and Bluetooth to actually perform +an action. -The rfkill switch subsystem offers support for keys often found on laptops -to enable wireless devices like WiFi and Bluetooth. +The buttons to enable and disable the wireless transmitters are important in +situations where the user is for example using his laptop on a location where +radio-frequency transmitters _must_ be disabled (e.g. airplanes). -This is done by providing the user 3 possibilities: - 1 - The rfkill system handles all events; userspace is not aware of events. - 2 - The rfkill system handles all events; userspace is informed about the events. - 3 - The rfkill system does not handle events; userspace handles all events. +Because of this requirement, userspace support for the keys should not be made +mandatory. Because userspace might want to perform some additional smarter +tasks when the key is pressed, rfkill provides userspace the possibility to +take over the task to handle the key events. -The buttons to enable and disable the wireless radios are important in -situations where the user is for example using his laptop on a location where -wireless radios _must_ be disabled (e.g. airplanes). -Because of this requirement, userspace support for the keys should not be -made mandatory. Because userspace might want to perform some additional smarter -tasks when the key is pressed, rfkill still provides userspace the possibility -to take over the task to handle the key events. +=============================================================================== +2: Implementation details + +The rfkill class provides kernel drivers with an interface that allows them to +know when they should enable or disable a wireless network device transmitter. + +The rfkill-input module provides the kernel with the ability to implement a +basic response when the user presses a key or button (or toggles a switch) +related to rfkill functionality. It is an in-kernel implementation of default +policy of reacting to rfkill-related input events and neither mandatory nor +required for wireless drivers to operate. + +The rfkill-input module also provides EPO (emergency power-off) functionality +for all wireless transmitters. This function cannot be overriden, and it is +always active. rfkill EPO is related to *_RFKILL_ALL input events. + +All state changes on rfkill devices are propagated by the rfkill class to a +notification chain and also to userspace through uevents. The system inside the kernel has been split into 2 separate sections: 1 - RFKILL 2 - RFKILL_INPUT -The first option enables rfkill support and will make sure userspace will -be notified of any events through the input device. It also creates several -sysfs entries which can be used by userspace. See section "Userspace support". +The first option enables rfkill support and will make sure userspace will be +notified of any events through uevents. It provides a notification chain for +interested parties in the kernel to also get notified of rfkill state changes +in other drivers. It creates several sysfs entries which can be used by +userspace. See section "Userspace support". + +The second option provides an rfkill input handler. This handler will listen to +all rfkill key events and will toggle the radio accordingly. With this option +enabled userspace could either do nothing or simply perform monitoring tasks. + +When a rfkill switch is in the RFKILL_STATE_ON, the wireless transmitter (radio +TX circuit for example) is *enabled*. When the rfkill switch is in the +RFKILL_STATE_OFF, the wireless transmitter is to be *blocked* from operating. + +Full rfkill functionality requires two different subsystems to cooperate: the +input layer and the rfkill class. The input layer issues *commands* to the +entire system requesting that devices registered to the rfkill class change +state. The way this interaction happens is not complex, but it is not obvious +either: + +Kernel Input layer: + + * Generates KEY_WWAN, KEY_WLAN, KEY_BLUETOOTH, SW_RFKILL_ALL, and + other such events when the user presses certain keys, buttons, or + toggles certain physical switches. + + THE INPUT LAYER IS NEVER USED TO PROPAGATE STATUS, NOTIFICATIONS OR THE + KIND OF STUFF AN ON-SCREEN-DISPLAY APPLICATION WOULD REPORT. It is + used to issue *commands* for the system to change behaviour, and these + commands may or may not be carried out by some kernel driver or + userspace application. It follows that doing user feedback based only + on input events is broken, there is no guarantee that an input event + will be acted upon. + + Most wireless communication device drivers implementing rfkill + functionality MUST NOT generate these events, and have no reason to + register themselves with the input layer. This is a common + misconception. There is an API to propagate rfkill status change + information, and it is NOT the input layer. + +rfkill class: + + * Calls a hook in a driver to effectively change the wireless + transmitter state; + * Keeps track of the wireless transmitter state (with help from + the driver); + * Generates userspace notifications (uevents) and a call to a + notification chain (kernel) when there is a wireless transmitter + state change; + * Connects a wireless communications driver with the common rfkill + control system, which, for example, allows actions such as + "switch all bluetooth devices offline" to be carried out by + userspace or by rfkill-input. + + THE RFKILL CLASS NEVER ISSUES INPUT EVENTS. THE RFKILL CLASS DOES + NOT LISTEN TO INPUT EVENTS. NO DRIVER USING THE RFKILL CLASS SHALL + EVER LISTEN TO, OR ACT ON RFKILL INPUT EVENTS. + + Most wireless data communication drivers in the kernel have just to + implement the rfkill class API to work properly. Interfacing to the + input layer is not often required (and is very often a *bug*). + +Userspace input handlers (uevents) or kernel input handlers (rfkill-input): + + * Implements the policy of what should happen when one of the input + layer events related to rfkill operation is received. + * Uses the sysfs interface (userspace) or private rfkill API calls + to tell the devices registered with the rfkill class to change + their state (i.e. translates the input layer event into real + action). + * rfkill-input implements EPO by handling EV_SW SW_RFKILL_ALL 0 + (power off all transmitters) in a special way: it ignores any + overrides and local state cache and forces all transmitters to + the OFF state (including those which are already supposed to be + OFF). Note that the opposite event (power on all transmitters) + is handled normally. + +Userspace uevent handler or kernel platform-specific drivers hooked to the +rfkill notifier chain: + + * Taps into the rfkill notifier chain or to KOBJ_CHANGE uevents, + in order to know when a device that is registered with the rfkill + class changes state; + * Issues feedback notifications to the user; + * In the rare platforms where this is required, synthesizes an input + event to command all *OTHER* rfkill devices to also change their + statues when a specific rfkill device changes state. + + +=============================================================================== +3: Kernel driver guidelines + +The first thing one needs to know is whether his driver should be talking to +the rfkill class or to the input layer. + +Do not mistake input devices for rfkill devices. The only type of "rfkill +switch" device that is to be registered with the rfkill class are those +directly controlling the circuits that cause a wireless transmitter to stop +working (or the software equivalent of them). Every other kind of "rfkill +switch" is just an input device and MUST NOT be registered with the rfkill +class. + +A driver should register a device with the rfkill class when ALL of the +following conditions are met: + +1. The device is/controls a data communications wireless transmitter; + +2. The kernel can interact with the hardware/firmware to CHANGE the wireless + transmitter state (block/unblock TX operation); + +A driver should register a device with the input subsystem to issue +rfkill-related events (KEY_WLAN, KEY_BLUETOOTH, KEY_WWAN, KEY_WIMAX, +SW_RFKILL_ALL, etc) when ALL of the folowing conditions are met: + +1. It is directly related to some physical device the user interacts with, to + command the O.S./firmware/hardware to enable/disable a data communications + wireless transmitter. + + Examples of the physical device are: buttons, keys and switches the user + will press/touch/slide/switch to enable or disable the wireless + communication device. + +2. It is NOT slaved to another device, i.e. there is no other device that + issues rfkill-related input events in preference to this one. + + Typically, the ACPI "radio kill" switch of a laptop is the master input + device to issue rfkill events, and, e.g., the WLAN card is just a slave + device that gets disabled by its hardware radio-kill input pin. -The second option provides an rfkill input handler. This handler will -listen to all rfkill key events and will toggle the radio accordingly. -With this option enabled userspace could either do nothing or simply -perform monitoring tasks. +When in doubt, do not issue input events. For drivers that should generate +input events in some platforms, but not in others (e.g. b43), the best solution +is to NEVER generate input events in the first place. That work should be +deferred to a platform-specific kernel module (which will know when to generate +events through the rfkill notifier chain) or to userspace. This avoids the +usual maintenance problems with DMI whitelisting. + +Corner cases and examples: ==================================== -2: Driver support -To build a driver with rfkill subsystem support, the driver should -depend on the Kconfig symbol RFKILL; it should _not_ depend on -RKFILL_INPUT. +1. If the device is an input device that, because of hardware or firmware, +causes wireless transmitters to be blocked regardless of the kernel's will, it +is still just an input device, and NOT to be registered with the rfkill class. -Unless key events trigger an interrupt to which the driver listens, polling -will be required to determine the key state changes. For this the input -layer providers the input-polldev handler. +2. If the wireless transmitter switch control is read-only, it is an input +device and not to be registered with the rfkill class (and maybe not to be made +an input layer event source either, see below). -A driver should implement a few steps to correctly make use of the -rfkill subsystem. First for non-polling drivers: +3. If there is some other device driver *closer* to the actual hardware the +user interacted with (the button/switch/key) to issue an input event, THAT is +the device driver that should be issuing input events. - - rfkill_allocate() - - input_allocate_device() - - rfkill_register() - - input_register_device() +E.g: + [RFKILL slider switch] -- [GPIO hardware] -- [WLAN card rf-kill input] + (platform driver) (wireless card driver) + +The user is closer to the RFKILL slide switch plaform driver, so the driver +which must issue input events is the platform driver looking at the GPIO +hardware, and NEVER the wireless card driver (which is just a slave). It is +very likely that there are other leaves than just the WLAN card rf-kill input +(e.g. a bluetooth card, etc)... + +On the other hand, some embedded devices do this: + + [RFKILL slider switch] -- [WLAN card rf-kill input] + (wireless card driver) + +In this situation, the wireless card driver *could* register itself as an input +device and issue rf-kill related input events... but in order to AVOID the need +for DMI whitelisting, the wireless card driver does NOT do it. Userspace (HAL) +or a platform driver (that exists only on these embedded devices) will do the +dirty job of issuing the input events. + + +COMMON MISTAKES in kernel drivers, related to rfkill: +==================================== + +1. NEVER confuse input device keys and buttons with input device switches. + + 1a. Switches are always set or reset. They report the current state + (on position or off position). + + 1b. Keys and buttons are either in the pressed or not-pressed state, and + that's it. A "button" that latches down when you press it, and + unlatches when you press it again is in fact a switch as far as input + devices go. + +Add the SW_* events you need for switches, do NOT try to emulate a button using +KEY_* events just because there is no such SW_* event yet. Do NOT try to use, +for example, KEY_BLUETOOTH when you should be using SW_BLUETOOTH instead. + +2. Input device switches (sources of EV_SW events) DO store their current +state, and that state CAN be queried from userspace through IOCTLs. There is +no sysfs interface for this, but that doesn't mean you should break things +trying to hook it to the rfkill class to get a sysfs interface :-) + +3. Do not issue *_RFKILL_ALL events, unless you are sure it is the correct +event for your switch/button. These events are emergency power-off events when +they are trying to turn the transmitters off. An example of an input device +which SHOULD generate *_RFKILL_ALL events is the wireless-kill switch in a +laptop which is NOT a hotkey, but a real switch that kills radios in hardware, +even if the O.S. has gone to lunch. An example of an input device which SHOULD +NOT generate *_RFKILL_ALL events is any sort of hot key that does nothing by +itself, as well as any hot key that is type-specific (e.g. the one for WLAN). + + +=============================================================================== +4: Kernel API + +To build a driver with rfkill subsystem support, the driver should depend on +the Kconfig symbol RFKILL; it should _not_ depend on RKFILL_INPUT. + +The hardware the driver talks to may be write-only (where the current state +of the hardware is unknown), or read-write (where the hardware can be queried +about its current state). + +The rfkill class will call the get_state hook of a device every time it needs +to know the *real* current state of the hardware. This can happen often. + +Some hardware provides events when its status changes. In these cases, it is +best for the driver to not provide a get_state hook, and instead register the +rfkill class *already* with the correct status, and keep it updated using +rfkill_force_state() when it gets an event from the hardware. -For polling drivers: +There is no provision for a statically-allocated rfkill struct. You must +use rfkill_allocate() to allocate one. +You should: - rfkill_allocate() - - input_allocate_polled_device() + - modify rfkill fields (flags, name) + - modify state to the current hardware state (THIS IS THE ONLY TIME + YOU CAN ACCESS state DIRECTLY) - rfkill_register() - - input_register_polled_device() -When a key event has been detected, the correct event should be -sent over the input device which has been registered by the driver. +Please refer to the source for more documentation. -==================================== -3: Userspace support +=============================================================================== +5: Userspace support + +rfkill devices issue uevents (with an action of "change"), with the following +environment variables set: + +RFKILL_NAME +RFKILL_STATE +RFKILL_TYPE -For each key an input device will be created which will send out the correct -key event when the rfkill key has been pressed. +The ABI for these variables is defined by the sysfs attributes. It is best +to take a quick look at the source to make sure of the possible values. + +It is expected that HAL will trap those, and bridge them to DBUS, etc. These +events CAN and SHOULD be used to give feedback to the user about the rfkill +status of the system. + +Input devices may issue events that are related to rfkill. These are the +various KEY_* events and SW_* events supported by rfkill-input.c. + +******IMPORTANT****** +When rfkill-input is ACTIVE, userspace is NOT TO CHANGE THE STATE OF AN RFKILL +SWITCH IN RESPONSE TO AN INPUT EVENT also handled by rfkill-input, unless it +has set to true the user_claim attribute for that particular switch. This rule +is *absolute*; do NOT violate it. +******IMPORTANT****** + +Userspace must not assume it is the only source of control for rfkill switches. +Their state CAN and WILL change on its own, due to firmware actions, direct +user actions, and the rfkill-input EPO override for *_RFKILL_ALL. + +When rfkill-input is not active, userspace must initiate an rfkill status +change by writing to the "state" attribute in order for anything to happen. + +Take particular care to implement EV_SW SW_RFKILL_ALL properly. When that +switch is set to OFF, *every* rfkill device *MUST* be immediately put into the +OFF state, no questions asked. The following sysfs entries will be created: @@ -87,10 +332,18 @@ The following sysfs entries will be created: claim: 1: Userspace handles events, 0: Kernel handles events Both the "state" and "claim" entries are also writable. For the "state" entry -this means that when 1 or 0 is written all radios, not yet in the requested -state, will be will be toggled accordingly. +this means that when 1 or 0 is written, the device rfkill state (if not yet in +the requested state), will be will be toggled accordingly. + For the "claim" entry writing 1 to it means that the kernel no longer handles key events even though RFKILL_INPUT input was enabled. When "claim" has been set to 0, userspace should make sure that it listens for the input events or -check the sysfs "state" entry regularly to correctly perform the required -tasks when the rkfill key is pressed. +check the sysfs "state" entry regularly to correctly perform the required tasks +when the rkfill key is pressed. + +A note about input devices and EV_SW events: + +In order to know the current state of an input device switch (like +SW_RFKILL_ALL), you will need to use an IOCTL. That information is not +available through sysfs in a generic way at this time, and it is not available +through the rfkill class AT ALL. -- cgit v1.2.3-58-ga151 From 5005657cbd0fd6f277f807c0612a6b6d4396a02c Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Mon, 23 Jun 2008 17:46:42 -0300 Subject: rfkill: rename the rfkill_state states and add block-locked state The current naming of rfkill_state causes a lot of confusion: not only the "kill" in rfkill suggests negative logic, but also the fact that rfkill cannot turn anything on (it can just force something off or stop forcing something off) is often forgotten. Rename RFKILL_STATE_OFF to RFKILL_STATE_SOFT_BLOCKED (transmitter is blocked and will not operate; state can be changed by a toggle_radio request), and RFKILL_STATE_ON to RFKILL_STATE_UNBLOCKED (transmitter is not blocked, and may operate). Also, add a new third state, RFKILL_STATE_HARD_BLOCKED (transmitter is blocked and will not operate; state cannot be changed through a toggle_radio request), which is used by drivers to indicate a wireless transmiter was blocked by a hardware rfkill line that accepts no overrides. Keep the old names as #defines, but document them as deprecated. This way, drivers can be converted to the new names *and* verified to actually use rfkill correctly one by one. Signed-off-by: Henrique de Moraes Holschuh Acked-by: Ivo van Doorn Signed-off-by: John W. Linville --- Documentation/rfkill.txt | 56 +++++++++++++++++++++++++++++------ include/linux/rfkill.h | 32 ++++++++++++++++++-- net/rfkill/rfkill-input.c | 29 +++++++++--------- net/rfkill/rfkill.c | 75 ++++++++++++++++++++++++++++++++++++++--------- 4 files changed, 152 insertions(+), 40 deletions(-) (limited to 'Documentation') diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt index cf230c1ad9ef..5316cea95ce0 100644 --- a/Documentation/rfkill.txt +++ b/Documentation/rfkill.txt @@ -60,9 +60,20 @@ The second option provides an rfkill input handler. This handler will listen to all rfkill key events and will toggle the radio accordingly. With this option enabled userspace could either do nothing or simply perform monitoring tasks. -When a rfkill switch is in the RFKILL_STATE_ON, the wireless transmitter (radio -TX circuit for example) is *enabled*. When the rfkill switch is in the -RFKILL_STATE_OFF, the wireless transmitter is to be *blocked* from operating. +When a rfkill switch is in the RFKILL_STATE_UNBLOCKED, the wireless transmitter +(radio TX circuit for example) is *enabled*. When the rfkill switch is in the +RFKILL_STATE_SOFT_BLOCKED or RFKILL_STATE_HARD_BLOCKED, the wireless +transmitter is to be *blocked* from operating. + +RFKILL_STATE_SOFT_BLOCKED indicates that a call to toggle_radio() can change +that state. RFKILL_STATE_HARD_BLOCKED indicates that a call to toggle_radio() +will not be able to change the state and will return with a suitable error if +attempts are made to set the state to RFKILL_STATE_UNBLOCKED. + +RFKILL_STATE_HARD_BLOCKED is used by drivers to signal that the device is +locked in the BLOCKED state by a hardwire rfkill line (typically an input pin +that, when active, forces the transmitter to be disabled) which the driver +CANNOT override. Full rfkill functionality requires two different subsystems to cooperate: the input layer and the rfkill class. The input layer issues *commands* to the @@ -122,10 +133,10 @@ Userspace input handlers (uevents) or kernel input handlers (rfkill-input): action). * rfkill-input implements EPO by handling EV_SW SW_RFKILL_ALL 0 (power off all transmitters) in a special way: it ignores any - overrides and local state cache and forces all transmitters to - the OFF state (including those which are already supposed to be - OFF). Note that the opposite event (power on all transmitters) - is handled normally. + overrides and local state cache and forces all transmitters to the + RFKILL_STATE_SOFT_BLOCKED state (including those which are already + supposed to be BLOCKED). Note that the opposite event (power on all + transmitters) is handled normally. Userspace uevent handler or kernel platform-specific drivers hooked to the rfkill notifier chain: @@ -284,6 +295,19 @@ You should: YOU CAN ACCESS state DIRECTLY) - rfkill_register() +The only way to set a device to the RFKILL_STATE_HARD_BLOCKED state is through +a suitable return of get_state() or through rfkill_force_state(). + +When a device is in the RFKILL_STATE_HARD_BLOCKED state, the only way to switch +it to a different state is through a suitable return of get_state() or through +rfkill_force_state(). + +If toggle_radio() is called to set a device to state RFKILL_STATE_SOFT_BLOCKED +when that device is already at the RFKILL_STATE_HARD_BLOCKED state, it should +not return an error. Instead, it should try to double-block the transmitter, +so that its state will change from RFKILL_STATE_HARD_BLOCKED to +RFKILL_STATE_SOFT_BLOCKED should the hardware blocking cease. + Please refer to the source for more documentation. =============================================================================== @@ -322,13 +346,27 @@ change by writing to the "state" attribute in order for anything to happen. Take particular care to implement EV_SW SW_RFKILL_ALL properly. When that switch is set to OFF, *every* rfkill device *MUST* be immediately put into the -OFF state, no questions asked. +RFKILL_STATE_SOFT_BLOCKED state, no questions asked. The following sysfs entries will be created: name: Name assigned by driver to this key (interface or driver name). type: Name of the key type ("wlan", "bluetooth", etc). - state: Current state of the key. 1: On, 0: Off. + state: Current state of the transmitter + 0: RFKILL_STATE_SOFT_BLOCKED + transmitter is forced off, but you can override it + by a write to the state attribute, or through input + events (if rfkill-input is loaded). + 1: RFKILL_STATE_UNBLOCKED + transmiter is NOT forced off, and may operate if + all other conditions for such operation are met + (such as interface is up and configured, etc). + 2: RFKILL_STATE_HARD_BLOCKED + transmitter is forced off by something outside of + the driver's control. + + You cannot set a device to this state through + writes to the state attribute. claim: 1: Userspace handles events, 0: Kernel handles events Both the "state" and "claim" entries are also writable. For the "state" entry diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h index 98667becdee4..c5f6e54ec6ae 100644 --- a/include/linux/rfkill.h +++ b/include/linux/rfkill.h @@ -46,16 +46,25 @@ enum rfkill_type { }; enum rfkill_state { - RFKILL_STATE_OFF = 0, /* Radio output blocked */ - RFKILL_STATE_ON = 1, /* Radio output active */ + RFKILL_STATE_SOFT_BLOCKED = 0, /* Radio output blocked */ + RFKILL_STATE_UNBLOCKED = 1, /* Radio output allowed */ + RFKILL_STATE_HARD_BLOCKED = 2, /* Output blocked, non-overrideable */ }; +/* + * These are DEPRECATED, drivers using them should be verified to + * comply with the rfkill usage guidelines in Documentation/rfkill.txt + * and then converted to use the new names for rfkill_state + */ +#define RFKILL_STATE_OFF RFKILL_STATE_SOFT_BLOCKED +#define RFKILL_STATE_ON RFKILL_STATE_UNBLOCKED + /** * struct rfkill - rfkill control structure. * @name: Name of the switch. * @type: Radio type which the button controls, the value stored * here should be a value from enum rfkill_type. - * @state: State of the switch, "ON" means radio can operate. + * @state: State of the switch, "UNBLOCKED" means radio can operate. * @user_claim_unsupported: Whether the hardware supports exclusive * RF-kill control by userspace. Set this before registering. * @user_claim: Set when the switch is controlled exlusively by userspace. @@ -63,8 +72,12 @@ enum rfkill_state { * @data: Pointer to the RF button drivers private data which will be * passed along when toggling radio state. * @toggle_radio(): Mandatory handler to control state of the radio. + * only RFKILL_STATE_SOFT_BLOCKED and RFKILL_STATE_UNBLOCKED are + * valid parameters. * @get_state(): handler to read current radio state from hardware, * may be called from atomic context, should return 0 on success. + * Either this handler OR judicious use of rfkill_force_state() is + * MANDATORY for any driver capable of RFKILL_STATE_HARD_BLOCKED. * @led_trigger: A LED trigger for this button's LED. * @dev: Device structure integrating the switch into device tree. * @node: Used to place switch into list of all switches known to the @@ -102,6 +115,19 @@ void rfkill_unregister(struct rfkill *rfkill); int rfkill_force_state(struct rfkill *rfkill, enum rfkill_state state); +/** + * rfkill_state_complement - return complementar state + * @state: state to return the complement of + * + * Returns RFKILL_STATE_SOFT_BLOCKED if @state is RFKILL_STATE_UNBLOCKED, + * returns RFKILL_STATE_UNBLOCKED otherwise. + */ +static inline enum rfkill_state rfkill_state_complement(enum rfkill_state state) +{ + return (state == RFKILL_STATE_UNBLOCKED) ? + RFKILL_STATE_SOFT_BLOCKED : RFKILL_STATE_UNBLOCKED; +} + /** * rfkill_get_led_name - Get the LED trigger name for the button's LED. * This function might return a NULL pointer if registering of the diff --git a/net/rfkill/rfkill-input.c b/net/rfkill/rfkill-input.c index 5d4c8b2446f7..8aa822730145 100644 --- a/net/rfkill/rfkill-input.c +++ b/net/rfkill/rfkill-input.c @@ -84,7 +84,8 @@ static void rfkill_schedule_toggle(struct rfkill_task *task) spin_lock_irqsave(&task->lock, flags); if (time_after(jiffies, task->last + msecs_to_jiffies(200))) { - task->desired_state = !task->desired_state; + task->desired_state = + rfkill_state_complement(task->desired_state); task->last = jiffies; schedule_work(&task->work); } @@ -92,14 +93,14 @@ static void rfkill_schedule_toggle(struct rfkill_task *task) spin_unlock_irqrestore(&task->lock, flags); } -#define DEFINE_RFKILL_TASK(n, t) \ - struct rfkill_task n = { \ - .work = __WORK_INITIALIZER(n.work, \ - rfkill_task_handler), \ - .type = t, \ - .mutex = __MUTEX_INITIALIZER(n.mutex), \ - .lock = __SPIN_LOCK_UNLOCKED(n.lock), \ - .desired_state = RFKILL_STATE_ON, \ +#define DEFINE_RFKILL_TASK(n, t) \ + struct rfkill_task n = { \ + .work = __WORK_INITIALIZER(n.work, \ + rfkill_task_handler), \ + .type = t, \ + .mutex = __MUTEX_INITIALIZER(n.mutex), \ + .lock = __SPIN_LOCK_UNLOCKED(n.lock), \ + .desired_state = RFKILL_STATE_UNBLOCKED, \ } static DEFINE_RFKILL_TASK(rfkill_wlan, RFKILL_TYPE_WLAN); @@ -135,15 +136,15 @@ static void rfkill_event(struct input_handle *handle, unsigned int type, /* handle EPO (emergency power off) through shortcut */ if (data) { rfkill_schedule_set(&rfkill_wwan, - RFKILL_STATE_ON); + RFKILL_STATE_UNBLOCKED); rfkill_schedule_set(&rfkill_wimax, - RFKILL_STATE_ON); + RFKILL_STATE_UNBLOCKED); rfkill_schedule_set(&rfkill_uwb, - RFKILL_STATE_ON); + RFKILL_STATE_UNBLOCKED); rfkill_schedule_set(&rfkill_bt, - RFKILL_STATE_ON); + RFKILL_STATE_UNBLOCKED); rfkill_schedule_set(&rfkill_wlan, - RFKILL_STATE_ON); + RFKILL_STATE_UNBLOCKED); } else rfkill_schedule_epo(); break; diff --git a/net/rfkill/rfkill.c b/net/rfkill/rfkill.c index 7d07175c407f..ce0e23148cdd 100644 --- a/net/rfkill/rfkill.c +++ b/net/rfkill/rfkill.c @@ -39,7 +39,7 @@ MODULE_LICENSE("GPL"); static LIST_HEAD(rfkill_list); /* list of registered rf switches */ static DEFINE_MUTEX(rfkill_mutex); -static unsigned int rfkill_default_state = RFKILL_STATE_ON; +static unsigned int rfkill_default_state = RFKILL_STATE_UNBLOCKED; module_param_named(default_state, rfkill_default_state, uint, 0444); MODULE_PARM_DESC(default_state, "Default initial state for all radio types, 0 = radio off"); @@ -98,7 +98,7 @@ static void rfkill_led_trigger(struct rfkill *rfkill, if (!led->name) return; - if (state == RFKILL_STATE_OFF) + if (state != RFKILL_STATE_UNBLOCKED) led_trigger_event(led, LED_OFF); else led_trigger_event(led, LED_FULL); @@ -128,6 +128,28 @@ static void update_rfkill_state(struct rfkill *rfkill) } } +/** + * rfkill_toggle_radio - wrapper for toggle_radio hook + * calls toggle_radio taking into account a lot of "small" + * details. + * @rfkill: the rfkill struct to use + * @force: calls toggle_radio even if cache says it is not needed, + * and also makes sure notifications of the state will be + * sent even if it didn't change + * @state: the new state to call toggle_radio() with + * + * This wrappen protects and enforces the API for toggle_radio + * calls. Note that @force cannot override a (possibly cached) + * state of RFKILL_STATE_HARD_BLOCKED. Any device making use of + * RFKILL_STATE_HARD_BLOCKED implements either get_state() or + * rfkill_force_state(), so the cache either is bypassed or valid. + * + * Note that we do call toggle_radio for RFKILL_STATE_SOFT_BLOCKED + * even if the radio is in RFKILL_STATE_HARD_BLOCKED state, so as to + * give the driver a hint that it should double-BLOCK the transmitter. + * + * Caller must have aquired rfkill_mutex. + */ static int rfkill_toggle_radio(struct rfkill *rfkill, enum rfkill_state state, int force) @@ -141,9 +163,28 @@ static int rfkill_toggle_radio(struct rfkill *rfkill, !rfkill->get_state(rfkill->data, &newstate)) rfkill->state = newstate; + switch (state) { + case RFKILL_STATE_HARD_BLOCKED: + /* typically happens when refreshing hardware state, + * such as on resume */ + state = RFKILL_STATE_SOFT_BLOCKED; + break; + case RFKILL_STATE_UNBLOCKED: + /* force can't override this, only rfkill_force_state() can */ + if (rfkill->state == RFKILL_STATE_HARD_BLOCKED) + return -EPERM; + break; + case RFKILL_STATE_SOFT_BLOCKED: + /* nothing to do, we want to give drivers the hint to double + * BLOCK even a transmitter that is already in state + * RFKILL_STATE_HARD_BLOCKED */ + break; + } + if (force || state != rfkill->state) { retval = rfkill->toggle_radio(rfkill->data, state); - if (!retval) + /* never allow a HARD->SOFT downgrade! */ + if (!retval && rfkill->state != RFKILL_STATE_HARD_BLOCKED) rfkill->state = state; } @@ -184,7 +225,7 @@ EXPORT_SYMBOL(rfkill_switch_all); /** * rfkill_epo - emergency power off all transmitters * - * This kicks all rfkill devices to RFKILL_STATE_OFF, ignoring + * This kicks all rfkill devices to RFKILL_STATE_SOFT_BLOCKED, ignoring * everything in its path but rfkill_mutex. */ void rfkill_epo(void) @@ -193,7 +234,7 @@ void rfkill_epo(void) mutex_lock(&rfkill_mutex); list_for_each_entry(rfkill, &rfkill_list, node) { - rfkill_toggle_radio(rfkill, RFKILL_STATE_OFF, 1); + rfkill_toggle_radio(rfkill, RFKILL_STATE_SOFT_BLOCKED, 1); } mutex_unlock(&rfkill_mutex); } @@ -215,8 +256,9 @@ int rfkill_force_state(struct rfkill *rfkill, enum rfkill_state state) { enum rfkill_state oldstate; - if (state != RFKILL_STATE_OFF && - state != RFKILL_STATE_ON) + if (state != RFKILL_STATE_SOFT_BLOCKED && + state != RFKILL_STATE_UNBLOCKED && + state != RFKILL_STATE_HARD_BLOCKED) return -EINVAL; mutex_lock(&rfkill->mutex); @@ -290,11 +332,14 @@ static ssize_t rfkill_state_store(struct device *dev, if (!capable(CAP_NET_ADMIN)) return -EPERM; + /* RFKILL_STATE_HARD_BLOCKED is illegal here... */ + if (state != RFKILL_STATE_UNBLOCKED && + state != RFKILL_STATE_SOFT_BLOCKED) + return -EINVAL; + if (mutex_lock_interruptible(&rfkill->mutex)) return -ERESTARTSYS; - error = rfkill_toggle_radio(rfkill, - state ? RFKILL_STATE_ON : RFKILL_STATE_OFF, - 0); + error = rfkill_toggle_radio(rfkill, state, 0); mutex_unlock(&rfkill->mutex); return error ? error : count; @@ -373,7 +418,8 @@ static int rfkill_suspend(struct device *dev, pm_message_t state) update_rfkill_state(rfkill); mutex_lock(&rfkill->mutex); - rfkill->toggle_radio(rfkill->data, RFKILL_STATE_OFF); + rfkill->toggle_radio(rfkill->data, + RFKILL_STATE_SOFT_BLOCKED); mutex_unlock(&rfkill->mutex); } @@ -470,7 +516,7 @@ static void rfkill_remove_switch(struct rfkill *rfkill) { mutex_lock(&rfkill_mutex); list_del_init(&rfkill->node); - rfkill_toggle_radio(rfkill, RFKILL_STATE_OFF, 1); + rfkill_toggle_radio(rfkill, RFKILL_STATE_SOFT_BLOCKED, 1); mutex_unlock(&rfkill_mutex); } @@ -610,8 +656,9 @@ static int __init rfkill_init(void) int error; int i; - if (rfkill_default_state != RFKILL_STATE_OFF && - rfkill_default_state != RFKILL_STATE_ON) + /* RFKILL_STATE_HARD_BLOCKED is illegal here... */ + if (rfkill_default_state != RFKILL_STATE_SOFT_BLOCKED && + rfkill_default_state != RFKILL_STATE_UNBLOCKED) return -EINVAL; for (i = 0; i < ARRAY_SIZE(rfkill_states); i++) -- cgit v1.2.3-58-ga151 From f7983f7301d530cb3f3a02618c944f6d108d11df Mon Sep 17 00:00:00 2001 From: Henrique de Moraes Holschuh Date: Mon, 23 Jun 2008 17:46:43 -0300 Subject: rfkill: improve documentation for kernel drivers Improve the documentation of how to use the rfkill class in kernel drivers, based on the doubts that came up in a thread in linux-wireless. Signed-off-by: Henrique de Moraes Holschuh Acked-by: Ivo van Doorn Signed-off-by: John W. Linville --- Documentation/rfkill.txt | 261 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 201 insertions(+), 60 deletions(-) (limited to 'Documentation') diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt index 5316cea95ce0..0843ed0163a5 100644 --- a/Documentation/rfkill.txt +++ b/Documentation/rfkill.txt @@ -4,6 +4,9 @@ rfkill - RF switch subsystem support 1 Introduction 2 Implementation details 3 Kernel driver guidelines +3.1 wireless device drivers +3.2 platform/switch drivers +3.3 input device drivers 4 Kernel API 5 Userspace support @@ -14,9 +17,14 @@ The rfkill switch subsystem exists to add a generic interface to circuitry that can enable or disable the signal output of a wireless *transmitter* of any type. By far, the most common use is to disable radio-frequency transmitters. -The rfkill switch subsystem offers support for keys and switches often found on -laptops to enable wireless devices like WiFi and Bluetooth to actually perform -an action. +Note that disabling the signal output means that the the transmitter is to be +made to not emit any energy when "blocked". rfkill is not about blocking data +transmissions, it is about blocking energy emission. + +The rfkill subsystem offers support for keys and switches often found on +laptops to enable wireless devices like WiFi and Bluetooth, so that these keys +and switches actually perform an action in all wireless devices of a given type +attached to the system. The buttons to enable and disable the wireless transmitters are important in situations where the user is for example using his laptop on a location where @@ -30,40 +38,81 @@ take over the task to handle the key events. =============================================================================== 2: Implementation details +The rfkill subsystem is composed of various components: the rfkill class, the +rfkill-input module (an input layer handler), and some specific input layer +events. + The rfkill class provides kernel drivers with an interface that allows them to know when they should enable or disable a wireless network device transmitter. +This is enabled by the CONFIG_RFKILL Kconfig option. + +The rfkill class support makes sure userspace will be notified of all state +changes on rfkill devices through uevents. It provides a notification chain +for interested parties in the kernel to also get notified of rfkill state +changes in other drivers. It creates several sysfs entries which can be used +by userspace. See section "Userspace support". The rfkill-input module provides the kernel with the ability to implement a basic response when the user presses a key or button (or toggles a switch) related to rfkill functionality. It is an in-kernel implementation of default policy of reacting to rfkill-related input events and neither mandatory nor -required for wireless drivers to operate. +required for wireless drivers to operate. It is enabled by the +CONFIG_RFKILL_INPUT Kconfig option. + +rfkill-input is a rfkill-related events input layer handler. This handler will +listen to all rfkill key events and will change the rfkill state of the +wireless devices accordingly. With this option enabled userspace could either +do nothing or simply perform monitoring tasks. The rfkill-input module also provides EPO (emergency power-off) functionality -for all wireless transmitters. This function cannot be overriden, and it is -always active. rfkill EPO is related to *_RFKILL_ALL input events. +for all wireless transmitters. This function cannot be overridden, and it is +always active. rfkill EPO is related to *_RFKILL_ALL input layer events. + + +Important terms for the rfkill subsystem: + +In order to avoid confusion, we avoid the term "switch" in rfkill when it is +referring to an electronic control circuit that enables or disables a +transmitter. We reserve it for the physical device a human manipulates +(which is an input device, by the way): + +rfkill switch: + + A physical device a human manipulates. Its state can be perceived by + the kernel either directly (through a GPIO pin, ACPI GPE) or by its + effect on a rfkill line of a wireless device. + +rfkill controller: -All state changes on rfkill devices are propagated by the rfkill class to a -notification chain and also to userspace through uevents. + A hardware circuit that controls the state of a rfkill line, which a + kernel driver can interact with *to modify* that state (i.e. it has + either write-only or read/write access). -The system inside the kernel has been split into 2 separate sections: - 1 - RFKILL - 2 - RFKILL_INPUT +rfkill line: -The first option enables rfkill support and will make sure userspace will be -notified of any events through uevents. It provides a notification chain for -interested parties in the kernel to also get notified of rfkill state changes -in other drivers. It creates several sysfs entries which can be used by -userspace. See section "Userspace support". + An input channel (hardware or software) of a wireless device, which + causes a wireless transmitter to stop emitting energy (BLOCK) when it + is active. Point of view is extremely important here: rfkill lines are + always seen from the PoV of a wireless device (and its driver). -The second option provides an rfkill input handler. This handler will listen to -all rfkill key events and will toggle the radio accordingly. With this option -enabled userspace could either do nothing or simply perform monitoring tasks. +soft rfkill line/software rfkill line: -When a rfkill switch is in the RFKILL_STATE_UNBLOCKED, the wireless transmitter -(radio TX circuit for example) is *enabled*. When the rfkill switch is in the -RFKILL_STATE_SOFT_BLOCKED or RFKILL_STATE_HARD_BLOCKED, the wireless -transmitter is to be *blocked* from operating. + A rfkill line the wireless device driver can directly change the state + of. Related to rfkill_state RFKILL_STATE_SOFT_BLOCKED. + +hard rfkill line/hardware rfkill line: + + A rfkill line that works fully in hardware or firmware, and that cannot + be overridden by the kernel driver. The hardware device or the + firmware just exports its status to the driver, but it is read-only. + Related to rfkill_state RFKILL_STATE_HARD_BLOCKED. + +The enum rfkill_state describes the rfkill state of a transmitter: + +When a rfkill line or rfkill controller is in the RFKILL_STATE_UNBLOCKED state, +the wireless transmitter (radio TX circuit for example) is *enabled*. When the +it is in the RFKILL_STATE_SOFT_BLOCKED or RFKILL_STATE_HARD_BLOCKED, the +wireless transmitter is to be *blocked* from operating. RFKILL_STATE_SOFT_BLOCKED indicates that a call to toggle_radio() can change that state. RFKILL_STATE_HARD_BLOCKED indicates that a call to toggle_radio() @@ -92,12 +141,12 @@ Kernel Input layer: used to issue *commands* for the system to change behaviour, and these commands may or may not be carried out by some kernel driver or userspace application. It follows that doing user feedback based only - on input events is broken, there is no guarantee that an input event + on input events is broken, as there is no guarantee that an input event will be acted upon. Most wireless communication device drivers implementing rfkill functionality MUST NOT generate these events, and have no reason to - register themselves with the input layer. This is a common + register themselves with the input layer. Doing otherwise is a common misconception. There is an API to propagate rfkill status change information, and it is NOT the input layer. @@ -117,11 +166,22 @@ rfkill class: THE RFKILL CLASS NEVER ISSUES INPUT EVENTS. THE RFKILL CLASS DOES NOT LISTEN TO INPUT EVENTS. NO DRIVER USING THE RFKILL CLASS SHALL - EVER LISTEN TO, OR ACT ON RFKILL INPUT EVENTS. + EVER LISTEN TO, OR ACT ON RFKILL INPUT EVENTS. Doing otherwise is + a layering violation. Most wireless data communication drivers in the kernel have just to implement the rfkill class API to work properly. Interfacing to the - input layer is not often required (and is very often a *bug*). + input layer is not often required (and is very often a *bug*) on + wireless drivers. + + Platform drivers often have to attach to the input layer to *issue* + (but never to listen to) rfkill events for rfkill switches, and also to + the rfkill class to export a control interface for the platform rfkill + controllers to the rfkill subsystem. This does NOT mean the rfkill + switch is attached to a rfkill class (doing so is almost always wrong). + It just means the same kernel module is the driver for different + devices (rfkill switches and rfkill controllers). + Userspace input handlers (uevents) or kernel input handlers (rfkill-input): @@ -153,24 +213,34 @@ rfkill notifier chain: =============================================================================== 3: Kernel driver guidelines +Remember: point-of-view is everything for a driver that connects to the rfkill +subsystem. All the details below must be measured/perceived from the point of +view of the specific driver being modified. + The first thing one needs to know is whether his driver should be talking to -the rfkill class or to the input layer. +the rfkill class or to the input layer. In rare cases (platform drivers), it +could happen that you need to do both, as platform drivers often handle a +variety of devices in the same driver. -Do not mistake input devices for rfkill devices. The only type of "rfkill +Do not mistake input devices for rfkill controllers. The only type of "rfkill switch" device that is to be registered with the rfkill class are those directly controlling the circuits that cause a wireless transmitter to stop -working (or the software equivalent of them). Every other kind of "rfkill -switch" is just an input device and MUST NOT be registered with the rfkill -class. +working (or the software equivalent of them), i.e. what we call a rfkill +controller. Every other kind of "rfkill switch" is just an input device and +MUST NOT be registered with the rfkill class. A driver should register a device with the rfkill class when ALL of the -following conditions are met: +following conditions are met (they define a rfkill controller): 1. The device is/controls a data communications wireless transmitter; 2. The kernel can interact with the hardware/firmware to CHANGE the wireless transmitter state (block/unblock TX operation); +3. The transmitter can be made to not emit any energy when "blocked": + rfkill is not about blocking data transmissions, it is about blocking + energy emission; + A driver should register a device with the input subsystem to issue rfkill-related events (KEY_WLAN, KEY_BLUETOOTH, KEY_WWAN, KEY_WIMAX, SW_RFKILL_ALL, etc) when ALL of the folowing conditions are met: @@ -186,9 +256,7 @@ SW_RFKILL_ALL, etc) when ALL of the folowing conditions are met: 2. It is NOT slaved to another device, i.e. there is no other device that issues rfkill-related input events in preference to this one. - Typically, the ACPI "radio kill" switch of a laptop is the master input - device to issue rfkill events, and, e.g., the WLAN card is just a slave - device that gets disabled by its hardware radio-kill input pin. + Please refer to the corner cases and examples section for more details. When in doubt, do not issue input events. For drivers that should generate input events in some platforms, but not in others (e.g. b43), the best solution @@ -252,26 +320,102 @@ Add the SW_* events you need for switches, do NOT try to emulate a button using KEY_* events just because there is no such SW_* event yet. Do NOT try to use, for example, KEY_BLUETOOTH when you should be using SW_BLUETOOTH instead. -2. Input device switches (sources of EV_SW events) DO store their current -state, and that state CAN be queried from userspace through IOCTLs. There is -no sysfs interface for this, but that doesn't mean you should break things -trying to hook it to the rfkill class to get a sysfs interface :-) +2. Input device switches (sources of EV_SW events) DO store their current state +(so you *must* initialize it by issuing a gratuitous input layer event on +driver start-up and also when resuming from sleep), and that state CAN be +queried from userspace through IOCTLs. There is no sysfs interface for this, +but that doesn't mean you should break things trying to hook it to the rfkill +class to get a sysfs interface :-) + +3. Do not issue *_RFKILL_ALL events by default, unless you are sure it is the +correct event for your switch/button. These events are emergency power-off +events when they are trying to turn the transmitters off. An example of an +input device which SHOULD generate *_RFKILL_ALL events is the wireless-kill +switch in a laptop which is NOT a hotkey, but a real switch that kills radios +in hardware, even if the O.S. has gone to lunch. An example of an input device +which SHOULD NOT generate *_RFKILL_ALL events by default, is any sort of hot +key that does nothing by itself, as well as any hot key that is type-specific +(e.g. the one for WLAN). -3. Do not issue *_RFKILL_ALL events, unless you are sure it is the correct -event for your switch/button. These events are emergency power-off events when -they are trying to turn the transmitters off. An example of an input device -which SHOULD generate *_RFKILL_ALL events is the wireless-kill switch in a -laptop which is NOT a hotkey, but a real switch that kills radios in hardware, -even if the O.S. has gone to lunch. An example of an input device which SHOULD -NOT generate *_RFKILL_ALL events is any sort of hot key that does nothing by -itself, as well as any hot key that is type-specific (e.g. the one for WLAN). +3.1 Guidelines for wireless device drivers +------------------------------------------ + +1. Each independent transmitter in a wireless device (usually there is only one +transmitter per device) should have a SINGLE rfkill class attached to it. + +2. If the device does not have any sort of hardware assistance to allow the +driver to rfkill the device, the driver should emulate it by taking all actions +required to silence the transmitter. + +3. If it is impossible to silence the transmitter (i.e. it still emits energy, +even if it is just in brief pulses, when there is no data to transmit and there +is no hardware support to turn it off) do NOT lie to the users. Do not attach +it to a rfkill class. The rfkill subsystem does not deal with data +transmission, it deals with energy emission. If the transmitter is emitting +energy, it is not blocked in rfkill terms. + +4. It doesn't matter if the device has multiple rfkill input lines affecting +the same transmitter, their combined state is to be exported as a single state +per transmitter (see rule 1). + +This rule exists because users of the rfkill subsystem expect to get (and set, +when possible) the overall transmitter rfkill state, not of a particular rfkill +line. + +Example of a WLAN wireless driver connected to the rfkill subsystem: +-------------------------------------------------------------------- + +A certain WLAN card has one input pin that causes it to block the transmitter +and makes the status of that input pin available (only for reading!) to the +kernel driver. This is a hard rfkill input line (it cannot be overridden by +the kernel driver). + +The card also has one PCI register that, if manipulated by the driver, causes +it to block the transmitter. This is a soft rfkill input line. + +It has also a thermal protection circuitry that shuts down its transmitter if +the card overheats, and makes the status of that protection available (only for +reading!) to the kernel driver. This is also a hard rfkill input line. + +If either one of these rfkill lines are active, the transmitter is blocked by +the hardware and forced offline. + +The driver should allocate and attach to its struct device *ONE* instance of +the rfkill class (there is only one transmitter). + +It can implement the get_state() hook, and return RFKILL_STATE_HARD_BLOCKED if +either one of its two hard rfkill input lines are active. If the two hard +rfkill lines are inactive, it must return RFKILL_STATE_SOFT_BLOCKED if its soft +rfkill input line is active. Only if none of the rfkill input lines are +active, will it return RFKILL_STATE_UNBLOCKED. + +If it doesn't implement the get_state() hook, it must make sure that its calls +to rfkill_force_state() are enough to keep the status always up-to-date, and it +must do a rfkill_force_state() on resume from sleep. + +Every time the driver gets a notification from the card that one of its rfkill +lines changed state (polling might be needed on badly designed cards that don't +generate interrupts for such events), it recomputes the rfkill state as per +above, and calls rfkill_force_state() to update it. + +The driver should implement the toggle_radio() hook, that: + +1. Returns an error if one of the hardware rfkill lines are active, and the +caller asked for RFKILL_STATE_UNBLOCKED. + +2. Activates the soft rfkill line if the caller asked for state +RFKILL_STATE_SOFT_BLOCKED. It should do this even if one of the hard rfkill +lines are active, effectively double-blocking the transmitter. + +3. Deactivates the soft rfkill line if none of the hardware rfkill lines are +active and the caller asked for RFKILL_STATE_UNBLOCKED. =============================================================================== 4: Kernel API To build a driver with rfkill subsystem support, the driver should depend on -the Kconfig symbol RFKILL; it should _not_ depend on RKFILL_INPUT. +(or select) the Kconfig symbol RFKILL; it should _not_ depend on RKFILL_INPUT. The hardware the driver talks to may be write-only (where the current state of the hardware is unknown), or read-write (where the hardware can be queried @@ -338,10 +482,10 @@ is *absolute*; do NOT violate it. ******IMPORTANT****** Userspace must not assume it is the only source of control for rfkill switches. -Their state CAN and WILL change on its own, due to firmware actions, direct -user actions, and the rfkill-input EPO override for *_RFKILL_ALL. +Their state CAN and WILL change due to firmware actions, direct user actions, +and the rfkill-input EPO override for *_RFKILL_ALL. -When rfkill-input is not active, userspace must initiate an rfkill status +When rfkill-input is not active, userspace must initiate a rfkill status change by writing to the "state" attribute in order for anything to happen. Take particular care to implement EV_SW SW_RFKILL_ALL properly. When that @@ -354,19 +498,16 @@ The following sysfs entries will be created: type: Name of the key type ("wlan", "bluetooth", etc). state: Current state of the transmitter 0: RFKILL_STATE_SOFT_BLOCKED - transmitter is forced off, but you can override it - by a write to the state attribute, or through input - events (if rfkill-input is loaded). + transmitter is forced off, but one can override it + by a write to the state attribute; 1: RFKILL_STATE_UNBLOCKED transmiter is NOT forced off, and may operate if all other conditions for such operation are met - (such as interface is up and configured, etc). + (such as interface is up and configured, etc); 2: RFKILL_STATE_HARD_BLOCKED transmitter is forced off by something outside of - the driver's control. - - You cannot set a device to this state through - writes to the state attribute. + the driver's control. One cannot set a device to + this state through writes to the state attribute; claim: 1: Userspace handles events, 0: Kernel handles events Both the "state" and "claim" entries are also writable. For the "state" entry -- cgit v1.2.3-58-ga151 From 0fd62b861eac7d2dea9b7e939953b20f37186ea1 Mon Sep 17 00:00:00 2001 From: Neil Brown Date: Sat, 28 Jun 2008 08:31:36 +1000 Subject: Make sure all changes to md/array_state are notified. Changes in md/array_state could be of interest to a monitoring program. So make sure all changes trigger a notification. Exceptions: changing active_idle to active is not reported because it is frequent and not interesting. changing active to active_idle is only reported on arrays with externally managed metadata, as it is not interesting otherwise. Signed-off-by: Neil Brown --- Documentation/md.txt | 5 +++++ drivers/md/md.c | 29 ++++++++++++++++++++++++----- 2 files changed, 29 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index a8b430627473..dca97ba4944a 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -236,6 +236,11 @@ All md devices contain: writing the word for the desired state, however some states cannot be explicitly set, and some transitions are not allowed. + Select/poll works on this file. All changes except between + active_idle and active (which can be frequent and are not + very interesting) are notified. active->active_idle is + reported if the metadata is externally managed. + clear No devices, no size, no level Writing is equivalent to STOP_ARRAY ioctl diff --git a/drivers/md/md.c b/drivers/md/md.c index 1442761ac98e..5b9d4fe4e6e4 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2716,8 +2716,10 @@ array_state_store(mddev_t *mddev, const char *buf, size_t len) } if (err) return err; - else + else { + sysfs_notify(&mddev->kobj, NULL, "array_state"); return len; + } } static struct md_sysfs_entry md_array_state = __ATTR(array_state, S_IRUGO|S_IWUSR, array_state_show, array_state_store); @@ -3408,7 +3410,11 @@ static void md_safemode_timeout(unsigned long data) { mddev_t *mddev = (mddev_t *) data; - mddev->safemode = 1; + if (!atomic_read(&mddev->writes_pending)) { + mddev->safemode = 1; + if (mddev->external) + sysfs_notify(&mddev->kobj, NULL, "array_state"); + } md_wakeup_thread(mddev->thread); } @@ -3675,6 +3681,7 @@ static int do_md_run(mddev_t * mddev) mddev->changed = 1; md_new_event(mddev); + sysfs_notify(&mddev->kobj, NULL, "array_state"); kobject_uevent(&mddev->gendisk->dev.kobj, KOBJ_CHANGE); return 0; } @@ -3709,6 +3716,8 @@ static int restart_array(mddev_t *mddev) md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); err = 0; + sysfs_notify(&mddev->kobj, NULL, "array_state"); + } else err = -EINVAL; @@ -3879,6 +3888,7 @@ static int do_md_stop(mddev_t * mddev, int mode) mdname(mddev)); err = 0; md_new_event(mddev); + sysfs_notify(&mddev->kobj, NULL, "array_state"); out: return err; } @@ -4876,8 +4886,9 @@ static int md_ioctl(struct inode *inode, struct file *file, mddev->ro && mddev->pers) { if (mddev->ro == 2) { mddev->ro = 0; - set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - md_wakeup_thread(mddev->thread); + sysfs_notify(&mddev->kobj, NULL, "array_state"); + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + md_wakeup_thread(mddev->thread); } else { err = -EROFS; @@ -5516,6 +5527,7 @@ void md_done_sync(mddev_t *mddev, int blocks, int ok) */ void md_write_start(mddev_t *mddev, struct bio *bi) { + int did_change = 0; if (bio_data_dir(bi) != WRITE) return; @@ -5526,6 +5538,7 @@ void md_write_start(mddev_t *mddev, struct bio *bi) set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); + did_change = 1; } atomic_inc(&mddev->writes_pending); if (mddev->safemode == 1) @@ -5536,10 +5549,12 @@ void md_write_start(mddev_t *mddev, struct bio *bi) mddev->in_sync = 0; set_bit(MD_CHANGE_CLEAN, &mddev->flags); md_wakeup_thread(mddev->thread); + did_change = 1; } spin_unlock_irq(&mddev->write_lock); - sysfs_notify(&mddev->kobj, NULL, "array_state"); } + if (did_change) + sysfs_notify(&mddev->kobj, NULL, "array_state"); wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_CLEAN, &mddev->flags) && !test_bit(MD_CHANGE_PENDING, &mddev->flags)); @@ -5991,18 +6006,22 @@ void md_check_recovery(mddev_t *mddev) int spares = 0; if (!mddev->external) { + int did_change = 0; spin_lock_irq(&mddev->write_lock); if (mddev->safemode && !atomic_read(&mddev->writes_pending) && !mddev->in_sync && mddev->recovery_cp == MaxSector) { mddev->in_sync = 1; + did_change = 1; if (mddev->persistent) set_bit(MD_CHANGE_CLEAN, &mddev->flags); } if (mddev->safemode == 1) mddev->safemode = 0; spin_unlock_irq(&mddev->write_lock); + if (did_change) + sysfs_notify(&mddev->kobj, NULL, "array_state"); } if (mddev->flags) -- cgit v1.2.3-58-ga151 From 72a23c211e4587859d5bf61ac4962d76e593fb02 Mon Sep 17 00:00:00 2001 From: Neil Brown Date: Sat, 28 Jun 2008 08:31:41 +1000 Subject: Make sure all changes to md/sync_action are notified. When the 'resync' thread starts or stops, when we explicitly set sync_action, or when we determine that there is definitely nothing to do, we notify sync_action. To stop "sync_action" from occasionally showing the wrong value, we introduce a new flags - MD_RECOVERY_RECOVER - to say that a recovery is probably needed or happening, and we make sure that we set MD_RECOVERY_RUNNING before clearing MD_RECOVERY_NEEDED. Signed-off-by: Neil Brown --- Documentation/md.txt | 6 ++++++ drivers/md/md.c | 34 ++++++++++++++++++++++++++++------ include/linux/raid/md_k.h | 2 ++ 3 files changed, 36 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index dca97ba4944a..c05bfb55659e 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -386,6 +386,12 @@ also have 'check' and 'repair' will start the appropriate process providing the current state is 'idle'. + This file responds to select/poll. Any important change in the value + triggers a poll event. Sometimes the value will briefly be + "recover" if a recovery seems to be needed, but cannot be + achieved. In that case, the transition to "recover" isn't + notified, but the transition away is. + mismatch_count When performing 'check' and 'repair', and possibly when performing 'resync', md will count the number of errors that are diff --git a/drivers/md/md.c b/drivers/md/md.c index 5b9d4fe4e6e4..c26dcad8a3ac 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -169,7 +169,6 @@ void md_new_event(mddev_t *mddev) { atomic_inc(&md_event_count); wake_up(&md_event_waiters); - sysfs_notify(&mddev->kobj, NULL, "sync_action"); } EXPORT_SYMBOL_GPL(md_new_event); @@ -2936,7 +2935,7 @@ action_show(mddev_t *mddev, char *page) type = "check"; else type = "repair"; - } else + } else if (test_bit(MD_RECOVERY_RECOVER, &mddev->recovery)) type = "recover"; } return sprintf(page, "%s\n", type); @@ -2958,9 +2957,12 @@ action_store(mddev_t *mddev, const char *page, size_t len) } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) return -EBUSY; - else if (cmd_match(page, "resync") || cmd_match(page, "recover")) + else if (cmd_match(page, "resync")) + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + else if (cmd_match(page, "recover")) { + set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - else if (cmd_match(page, "reshape")) { + } else if (cmd_match(page, "reshape")) { int err; if (mddev->pers->start_reshape == NULL) return -EINVAL; @@ -2977,6 +2979,7 @@ action_store(mddev_t *mddev, const char *page, size_t len) } set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); + sysfs_notify(&mddev->kobj, NULL, "sync_action"); return len; } @@ -3682,6 +3685,7 @@ static int do_md_run(mddev_t * mddev) mddev->changed = 1; md_new_event(mddev); sysfs_notify(&mddev->kobj, NULL, "array_state"); + sysfs_notify(&mddev->kobj, NULL, "sync_action"); kobject_uevent(&mddev->gendisk->dev.kobj, KOBJ_CHANGE); return 0; } @@ -4252,6 +4256,8 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info) export_rdev(rdev); md_update_sb(mddev, 1); + if (mddev->degraded) + set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); return err; @@ -5105,6 +5111,8 @@ void md_error(mddev_t *mddev, mdk_rdev_t *rdev) if (!mddev->pers->error_handler) return; mddev->pers->error_handler(mddev,rdev); + if (mddev->degraded) + set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); set_bit(MD_RECOVERY_INTR, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); @@ -6055,13 +6063,18 @@ void md_check_recovery(mddev_t *mddev) mddev->recovery = 0; /* flag recovery needed just to double check */ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + sysfs_notify(&mddev->kobj, NULL, "sync_action"); md_new_event(mddev); goto unlock; } + /* Set RUNNING before clearing NEEDED to avoid + * any transients in the value of "sync_action". + */ + set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); + clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); /* Clear some bits that don't mean anything, but * might be left set */ - clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); clear_bit(MD_RECOVERY_INTR, &mddev->recovery); clear_bit(MD_RECOVERY_DONE, &mddev->recovery); @@ -6079,17 +6092,19 @@ void md_check_recovery(mddev_t *mddev) /* Cannot proceed */ goto unlock; set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); + clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); } else if ((spares = remove_and_add_spares(mddev))) { clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); + set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); } else if (mddev->recovery_cp < MaxSector) { set_bit(MD_RECOVERY_SYNC, &mddev->recovery); + clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); } else if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) /* nothing to be done ... */ goto unlock; if (mddev->pers->sync_request) { - set_bit(MD_RECOVERY_RUNNING, &mddev->recovery); if (spares && mddev->bitmap && ! mddev->bitmap->file) { /* We are adding a device or devices to an array * which has the bitmap stored on all devices. @@ -6108,9 +6123,16 @@ void md_check_recovery(mddev_t *mddev) mddev->recovery = 0; } else md_wakeup_thread(mddev->sync_thread); + sysfs_notify(&mddev->kobj, NULL, "sync_action"); md_new_event(mddev); } unlock: + if (!mddev->sync_thread) { + clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); + if (test_and_clear_bit(MD_RECOVERY_RECOVER, + &mddev->recovery)) + sysfs_notify(&mddev->kobj, NULL, "sync_action"); + } mddev_unlock(mddev); } } diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h index 780e0613e6d5..62aa9c9a6ddc 100644 --- a/include/linux/raid/md_k.h +++ b/include/linux/raid/md_k.h @@ -188,6 +188,7 @@ struct mddev_s * NEEDED: we might need to start a resync/recover * RUNNING: a thread is running, or about to be started * SYNC: actually doing a resync, not a recovery + * RECOVER: doing recovery, or need to try it. * INTR: resync needs to be aborted for some reason * DONE: thread is done and is waiting to be reaped * REQUEST: user-space has requested a sync (used with SYNC) @@ -198,6 +199,7 @@ struct mddev_s */ #define MD_RECOVERY_RUNNING 0 #define MD_RECOVERY_SYNC 1 +#define MD_RECOVERY_RECOVER 2 #define MD_RECOVERY_INTR 3 #define MD_RECOVERY_DONE 4 #define MD_RECOVERY_NEEDED 5 -- cgit v1.2.3-58-ga151 From a99ac97113d5bc25ddc4d17f404c2024ac6c57f9 Mon Sep 17 00:00:00 2001 From: Neil Brown Date: Sat, 28 Jun 2008 08:31:43 +1000 Subject: Make sure all changes to md/degraded are notified. When a device fails, when a spare is activated, when an array is reshaped, or when an array is started, the extent to which the array is degraded can change. Signed-off-by: Neil Brown --- Documentation/md.txt | 7 +++++++ drivers/md/md.c | 6 +++++- 2 files changed, 12 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index c05bfb55659e..eb6e69e3732e 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -392,6 +392,13 @@ also have achieved. In that case, the transition to "recover" isn't notified, but the transition away is. + degraded + This contains a count of the number of devices by which the + arrays is degraded. So an optimal array with show '0'. A + single failed/missing drive will show '1', etc. + This file responds to select/poll, any increase or decrease + in the count of missing devices will trigger an event. + mismatch_count When performing 'check' and 'repair', and possibly when performing 'resync', md will count the number of errors that are diff --git a/drivers/md/md.c b/drivers/md/md.c index c26dcad8a3ac..60d4cad88c20 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2969,6 +2969,7 @@ action_store(mddev_t *mddev, const char *page, size_t len) err = mddev->pers->start_reshape(mddev); if (err) return err; + sysfs_notify(&mddev->kobj, NULL, "degraded"); } else { if (cmd_match(page, "check")) set_bit(MD_RECOVERY_CHECK, &mddev->recovery); @@ -3686,6 +3687,7 @@ static int do_md_run(mddev_t * mddev) md_new_event(mddev); sysfs_notify(&mddev->kobj, NULL, "array_state"); sysfs_notify(&mddev->kobj, NULL, "sync_action"); + sysfs_notify(&mddev->kobj, NULL, "degraded"); kobject_uevent(&mddev->gendisk->dev.kobj, KOBJ_CHANGE); return 0; } @@ -6049,7 +6051,9 @@ void md_check_recovery(mddev_t *mddev) if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) { /* success...*/ /* activate any spares */ - mddev->pers->spare_active(mddev); + if (mddev->pers->spare_active(mddev)) + sysfs_notify(&mddev->kobj, NULL, + "degraded"); } md_update_sb(mddev, 1); -- cgit v1.2.3-58-ga151 From 526647320e696f434647f38421a6ecf65b859c43 Mon Sep 17 00:00:00 2001 From: Neil Brown Date: Sat, 28 Jun 2008 08:31:44 +1000 Subject: Make sure all changes to md/dev-XX/state are notified The important state change happens during an interrupt in md_error. So just set a flag there and call sysfs_notify later in process context. Signed-off-by: Neil Brown --- Documentation/md.txt | 10 ++++++++++ drivers/md/md.c | 14 +++++++++++++- include/linux/raid/md_k.h | 3 +++ 3 files changed, 26 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/md.txt b/Documentation/md.txt index eb6e69e3732e..e06cc59437e4 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt @@ -297,6 +297,10 @@ Each directory contains: writemostly - device will only be subject to read requests if there are no other options. This applies only to raid1 arrays. + blocked - device has failed, metadata is "external", + and the failure hasn't been acknowledged yet. + Writes that would write to this device if + it were not faulty are blocked. spare - device is working, but not a full member. This includes spares that are in the process of being recovered to @@ -306,6 +310,12 @@ Each directory contains: Writing "remove" removes the device from the array. Writing "writemostly" sets the writemostly flag. Writing "-writemostly" clears the writemostly flag. + Writing "blocked" sets the "blocked" flag. + Writing "-blocked" clear the "blocked" flag and allows writes + to complete. + + This file responds to select/poll. Any change to 'faulty' + or 'blocked' causes an event. errors An approximate count of read errors that have been detected on diff --git a/drivers/md/md.c b/drivers/md/md.c index 60d4cad88c20..dc99d95a1b6d 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1886,6 +1886,8 @@ state_store(mdk_rdev_t *rdev, const char *buf, size_t len) err = 0; } + if (!err) + sysfs_notify(&rdev->kobj, NULL, "state"); return err ? err : len; } static struct rdev_sysfs_entry rdev_state = @@ -1979,7 +1981,8 @@ slot_store(mdk_rdev_t *rdev, const char *buf, size_t len) if (err) { rdev->raid_disk = -1; return err; - } + } else + sysfs_notify(&rdev->kobj, NULL, "state"); sprintf(nm, "rd%d", rdev->raid_disk); if (sysfs_create_link(&rdev->mddev->kobj, &rdev->kobj, nm)) printk(KERN_WARNING @@ -1996,6 +1999,7 @@ slot_store(mdk_rdev_t *rdev, const char *buf, size_t len) clear_bit(Faulty, &rdev->flags); clear_bit(WriteMostly, &rdev->flags); set_bit(In_sync, &rdev->flags); + sysfs_notify(&rdev->kobj, NULL, "state"); } return len; } @@ -3525,6 +3529,7 @@ static int do_md_run(mddev_t * mddev) return -EINVAL; } } + sysfs_notify(&rdev->kobj, NULL, "state"); } md_probe(mddev->unit, NULL, NULL); @@ -4256,6 +4261,8 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info) } if (err) export_rdev(rdev); + else + sysfs_notify(&rdev->kobj, NULL, "state"); md_update_sb(mddev, 1); if (mddev->degraded) @@ -5115,6 +5122,7 @@ void md_error(mddev_t *mddev, mdk_rdev_t *rdev) mddev->pers->error_handler(mddev,rdev); if (mddev->degraded) set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); + set_bit(StateChanged, &rdev->flags); set_bit(MD_RECOVERY_INTR, &mddev->recovery); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread); @@ -6037,6 +6045,10 @@ void md_check_recovery(mddev_t *mddev) if (mddev->flags) md_update_sb(mddev, 0); + rdev_for_each(rdev, rtmp, mddev) + if (test_and_clear_bit(StateChanged, &rdev->flags)) + sysfs_notify(&rdev->kobj, NULL, "state"); + if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && !test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h index 62aa9c9a6ddc..df30c4395875 100644 --- a/include/linux/raid/md_k.h +++ b/include/linux/raid/md_k.h @@ -87,6 +87,9 @@ struct mdk_rdev_s #define Blocked 8 /* An error occured on an externally * managed array, don't allow writes * until it is cleared */ +#define StateChanged 9 /* Faulty or Blocked has changed during + * interrupt, so it needs to be + * notified by the thread */ wait_queue_head_t blocked_wait; int desc_nr; /* descriptor index in the superblock */ -- cgit v1.2.3-58-ga151 From 007de8b4fdd4f3f8ef9891f20b5dc03cf693bb5f Mon Sep 17 00:00:00 2001 From: James Lentini Date: Mon, 2 Jun 2008 15:33:59 -0400 Subject: update NFS/RDMA documentation Update the NFS/RDMA documentation to clarify how to run mount.nfs. Signed-off-by: James Lentini Signed-off-by: J. Bruce Fields --- Documentation/filesystems/nfs-rdma.txt | 75 ++++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 31 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt index d0ec45ae4e7d..9ad453d4891a 100644 --- a/Documentation/filesystems/nfs-rdma.txt +++ b/Documentation/filesystems/nfs-rdma.txt @@ -5,7 +5,7 @@ ################################################################################ Author: NetApp and Open Grid Computing - Date: April 15, 2008 + Date: May 29, 2008 Table of Contents ~~~~~~~~~~~~~~~~~ @@ -60,16 +60,18 @@ Installation The procedures described in this document have been tested with distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). - - Install nfs-utils-1.1.1 or greater on the client + - Install nfs-utils-1.1.2 or greater on the client - An NFS/RDMA mount point can only be obtained by using the mount.nfs - command in nfs-utils-1.1.1 or greater. To see which version of mount.nfs - you are using, type: + An NFS/RDMA mount point can be obtained by using the mount.nfs command in + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils version + with support for NFS/RDMA mounts, but for various reasons we recommend using + nfs-utils-1.1.2 or greater). To see which version of mount.nfs you are + using, type: - > /sbin/mount.nfs -V + $ /sbin/mount.nfs -V - If the version is less than 1.1.1 or the command does not exist, - then you will need to install the latest version of nfs-utils. + If the version is less than 1.1.2 or the command does not exist, + you should install the latest version of nfs-utils. Download the latest package from: @@ -77,22 +79,32 @@ Installation Uncompress the package and follow the installation instructions. - If you will not be using GSS and NFSv4, the installation process - can be simplified by disabling these features when running configure: + If you will not need the idmapper and gssd executables (you do not need + these to create an NFS/RDMA enabled mount command), the installation + process can be simplified by disabling these features when running + configure: - > ./configure --disable-gss --disable-nfsv4 + $ ./configure --disable-gss --disable-nfsv4 - For more information on this see the package's README and INSTALL files. + To build nfs-utils you will need the tcp_wrappers package installed. For + more information on this see the package's README and INSTALL files. After building the nfs-utils package, there will be a mount.nfs binary in the utils/mount directory. This binary can be used to initiate NFS v2, v3, or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4. The standard technique is to create a symlink called mount.nfs4 to mount.nfs. - NOTE: mount.nfs and therefore nfs-utils-1.1.1 or greater is only needed + This mount.nfs binary should be installed at /sbin/mount.nfs as follows: + + $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs + + In this location, mount.nfs will be invoked automatically for NFS mounts + by the system mount commmand. + + NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed on the NFS client machine. You do not need this specific version of nfs-utils on the server. Furthermore, only the mount.nfs command from - nfs-utils-1.1.1 is needed on the client. + nfs-utils-1.1.2 is needed on the client. - Install a Linux kernel with NFS/RDMA @@ -156,8 +168,8 @@ Check RDMA and NFS Setup this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel card: - > modprobe ib_mthca - > modprobe ib_ipoib + $ modprobe ib_mthca + $ modprobe ib_ipoib If you are using InfiniBand, make sure there is a Subnet Manager (SM) running on the network. If your IB switch has an embedded SM, you can @@ -166,7 +178,7 @@ Check RDMA and NFS Setup If an SM is running on your network, you should see the following: - > cat /sys/class/infiniband/driverX/ports/1/state + $ cat /sys/class/infiniband/driverX/ports/1/state 4: ACTIVE where driverX is mthca0, ipath5, ehca3, etc. @@ -174,10 +186,10 @@ Check RDMA and NFS Setup To further test the InfiniBand software stack, use IPoIB (this assumes you have two IB hosts named host1 and host2): - host1> ifconfig ib0 a.b.c.x - host2> ifconfig ib0 a.b.c.y - host1> ping a.b.c.y - host2> ping a.b.c.x + host1$ ifconfig ib0 a.b.c.x + host2$ ifconfig ib0 a.b.c.y + host1$ ping a.b.c.y + host2$ ping a.b.c.x For other device types, follow the appropriate procedures. @@ -214,9 +226,9 @@ NFS/RDMA Setup For InfiniBand using a Mellanox adapter: - > modprobe ib_mthca - > modprobe ib_ipoib - > ifconfig ib0 a.b.c.d + $ modprobe ib_mthca + $ modprobe ib_ipoib + $ ifconfig ib0 a.b.c.d NOTE: use unique addresses for the client and server @@ -225,30 +237,31 @@ NFS/RDMA Setup If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), load the RDMA transport module: - > modprobe svcrdma + $ modprobe svcrdma Regardless of how the server was built (module or built-in), start the server: - > /etc/init.d/nfs start + $ /etc/init.d/nfs start or - > service nfs start + $ service nfs start Instruct the server to listen on the RDMA transport: - > echo rdma 2050 > /proc/fs/nfsd/portlist + $ echo rdma 2050 > /proc/fs/nfsd/portlist - On the client system If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), load the RDMA client module: - > modprobe xprtrdma.ko + $ modprobe xprtrdma.ko - Regardless of how the client was built (module or built-in), issue the mount.nfs command: + Regardless of how the client was built (module or built-in), use this command to + mount the NFS/RDMA server: - > /path/to/your/mount.nfs :/ /mnt -i -o rdma,port=2050 + $ mount -o rdma,port=2050 :/ /mnt To verify that the mount is using RDMA, run "cat /proc/mounts" and check the "proto" field for the given mount. -- cgit v1.2.3-58-ga151 From 3cd2cfeae187fb754f9530e3f919256f350e89ca Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 2 Jun 2008 16:01:51 -0400 Subject: nfs: rewrap NFS/RDMA documentation to 80 lines Wrap long lines. Signed-off-by: J. Bruce Fields --- Documentation/filesystems/nfs-rdma.txt | 40 ++++++++++++++++++---------------- 1 file changed, 21 insertions(+), 19 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt index 9ad453d4891a..44bd766f2e5d 100644 --- a/Documentation/filesystems/nfs-rdma.txt +++ b/Documentation/filesystems/nfs-rdma.txt @@ -63,10 +63,10 @@ Installation - Install nfs-utils-1.1.2 or greater on the client An NFS/RDMA mount point can be obtained by using the mount.nfs command in - nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils version - with support for NFS/RDMA mounts, but for various reasons we recommend using - nfs-utils-1.1.2 or greater). To see which version of mount.nfs you are - using, type: + nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils + version with support for NFS/RDMA mounts, but for various reasons we + recommend using nfs-utils-1.1.2 or greater). To see which version of + mount.nfs you are using, type: $ /sbin/mount.nfs -V @@ -91,8 +91,9 @@ Installation After building the nfs-utils package, there will be a mount.nfs binary in the utils/mount directory. This binary can be used to initiate NFS v2, v3, - or v4 mounts. To initiate a v4 mount, the binary must be called mount.nfs4. - The standard technique is to create a symlink called mount.nfs4 to mount.nfs. + or v4 mounts. To initiate a v4 mount, the binary must be called + mount.nfs4. The standard technique is to create a symlink called + mount.nfs4 to mount.nfs. This mount.nfs binary should be installed at /sbin/mount.nfs as follows: @@ -214,11 +215,11 @@ NFS/RDMA Setup /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) - The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the - cleint's iWARP address(es) for an RNIC. + The IP address(es) is(are) the client's IPoIB address for an InfiniBand + HCA or the cleint's iWARP address(es) for an RNIC. - NOTE: The "insecure" option must be used because the NFS/RDMA client does not - use a reserved port. + NOTE: The "insecure" option must be used because the NFS/RDMA client does + not use a reserved port. Each time a machine boots: @@ -234,12 +235,13 @@ NFS/RDMA Setup - Start the NFS server - If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), - load the RDMA transport module: + If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA transport module: $ modprobe svcrdma - Regardless of how the server was built (module or built-in), start the server: + Regardless of how the server was built (module or built-in), start the + server: $ /etc/init.d/nfs start @@ -253,17 +255,17 @@ NFS/RDMA Setup - On the client system - If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in kernel config), - load the RDMA client module: + If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in + kernel config), load the RDMA client module: $ modprobe xprtrdma.ko - Regardless of how the client was built (module or built-in), use this command to - mount the NFS/RDMA server: + Regardless of how the client was built (module or built-in), use this + command to mount the NFS/RDMA server: $ mount -o rdma,port=2050 :/ /mnt - To verify that the mount is using RDMA, run "cat /proc/mounts" and check the - "proto" field for the given mount. + To verify that the mount is using RDMA, run "cat /proc/mounts" and check + the "proto" field for the given mount. Congratulations! You're using NFS/RDMA! -- cgit v1.2.3-58-ga151 From 6dbf4bcac98bbc76ef425b3a2b4169f31199f6c7 Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Tue, 1 Jul 2008 19:29:07 -0700 Subject: icmp: fix units for ratelimit Convert the sysctl values for icmp ratelimit to use milliseconds instead of jiffies which is based on kernel configured HZ. Internal kernel jiffies are not a proper unit for any userspace API. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 10 ++++++---- net/ipv4/sysctl_net_ipv4.c | 3 ++- net/ipv6/icmp.c | 3 ++- 3 files changed, 10 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 17a6e46fbd43..71c7bea97160 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -548,8 +548,9 @@ icmp_echo_ignore_broadcasts - BOOLEAN icmp_ratelimit - INTEGER Limit the maximal rates for sending ICMP packets whose type matches icmp_ratemask (see below) to specific targets. - 0 to disable any limiting, otherwise the maximal rate in jiffies(1) - Default: 100 + 0 to disable any limiting, + otherwise the minimal space between responses in milliseconds. + Default: 1000 icmp_ratemask - INTEGER Mask made of ICMP types for which rates are being limited. @@ -1027,8 +1028,9 @@ max_addresses - INTEGER icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. - 0 to disable any limiting, otherwise the maximal rate in jiffies(1) - Default: 100 + 0 to disable any limiting, + otherwise the minimal space between responses in milliseconds. + Default: 1000 IPv6 Update by: diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 901607003205..14ef202a2254 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -793,7 +793,8 @@ static struct ctl_table ipv4_net_table[] = { .data = &init_net.ipv4.sysctl_icmp_ratelimit, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = &proc_dointvec + .proc_handler = &proc_dointvec_ms_jiffies, + .strategy = &sysctl_ms_jiffies }, { .ctl_name = NET_IPV4_ICMP_RATEMASK, diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 399d41f65437..abedf95fdf2d 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -954,7 +954,8 @@ ctl_table ipv6_icmp_table_template[] = { .data = &init_net.ipv6.sysctl.icmpv6_time, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = &proc_dointvec + .proc_handler = &proc_dointvec_ms_jiffies, + .strategy = &sysctl_ms_jiffies }, { .ctl_name = 0 }, }; -- cgit v1.2.3-58-ga151 From ecbed6a41900126e7b9509e12a8d0cc22176e3eb Mon Sep 17 00:00:00 2001 From: Vlad Yasevich Date: Tue, 1 Jul 2008 20:06:22 -0700 Subject: sctp: Mark GET_PEER|LOCAL_ADDR_OLD deprecated. Socket options SCTP_GET_PEER_ADDR_OLD, SCTP_GET_PEER_ADDR_NUM_OLD, SCTP_GET_LOCAL_ADDR_OLD, and SCTP_GET_PEER_LOCAL_ADDR_NUM_OLD have been replaced by newer versions a since 2005. It's time to officially deprecate them and schedule them for removal. Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller --- Documentation/feature-removal-schedule.txt | 12 ++++++++++++ net/sctp/socket.c | 12 ++++++++++++ 2 files changed, 24 insertions(+) (limited to 'Documentation') diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 5b3f31faed56..5378511a5f9f 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -312,3 +312,15 @@ When: 2.6.26 Why: Implementation became generic; users should now include linux/semaphore.h instead. Who: Matthew Wilcox + +--------------------------- + +What: SCTP_GET_PEER_ADDRS_NUM_OLD, SCTP_GET_PEER_ADDRS_OLD, + SCTP_GET_LOCAL_ADDRS_NUM_OLD, SCTP_GET_LOCAL_ADDRS_OLD +When: June 2009 +Why: A newer version of the options have been introduced in 2005 that + removes the limitions of the old API. The sctp library has been + converted to use these new options at the same time. Any user + space app that directly uses the old options should convert to using + the new options. +Who: Vlad Yasevich diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 43460a1cb6d0..df5572c39f0c 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4223,6 +4223,8 @@ static int sctp_getsockopt_peer_addrs_num_old(struct sock *sk, int len, if (copy_from_user(&id, optval, sizeof(sctp_assoc_t))) return -EFAULT; + printk(KERN_WARNING "SCTP: Use of SCTP_GET_PEER_ADDRS_NUM_OLD " + "socket option deprecated\n"); /* For UDP-style sockets, id specifies the association to query. */ asoc = sctp_id2assoc(sk, id); if (!asoc) @@ -4262,6 +4264,9 @@ static int sctp_getsockopt_peer_addrs_old(struct sock *sk, int len, if (getaddrs.addr_num <= 0) return -EINVAL; + printk(KERN_WARNING "SCTP: Use of SCTP_GET_PEER_ADDRS_OLD " + "socket option deprecated\n"); + /* For UDP-style sockets, id specifies the association to query. */ asoc = sctp_id2assoc(sk, getaddrs.assoc_id); if (!asoc) @@ -4355,6 +4360,9 @@ static int sctp_getsockopt_local_addrs_num_old(struct sock *sk, int len, if (copy_from_user(&id, optval, sizeof(sctp_assoc_t))) return -EFAULT; + printk(KERN_WARNING "SCTP: Use of SCTP_GET_LOCAL_ADDRS_NUM_OLD " + "socket option deprecated\n"); + /* * For UDP-style sockets, id specifies the association to query. * If the id field is set to the value '0' then the locally bound @@ -4515,6 +4523,10 @@ static int sctp_getsockopt_local_addrs_old(struct sock *sk, int len, if (getaddrs.addr_num <= 0 || getaddrs.addr_num >= (INT_MAX / sizeof(union sctp_addr))) return -EINVAL; + + printk(KERN_WARNING "SCTP: Use of SCTP_GET_LOCAL_ADDRS_OLD " + "socket option deprecated\n"); + /* * For UDP-style sockets, id specifies the association to query. * If the id field is set to the value '0' then the locally bound -- cgit v1.2.3-58-ga151 From 778d80be52699596bf70e0eb0761cf5e1e46088d Mon Sep 17 00:00:00 2001 From: YOSHIFUJI Hideaki Date: Sat, 28 Jun 2008 14:17:11 +0900 Subject: ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface. Signed-off-by: YOSHIFUJI Hideaki --- Documentation/networking/ip-sysctl.txt | 4 ++++ include/linux/ipv6.h | 2 ++ net/ipv6/addrconf.c | 11 +++++++++++ net/ipv6/ip6_input.c | 3 ++- net/ipv6/ip6_output.c | 7 +++++++ 5 files changed, 26 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 71c7bea97160..dae980e8f1b9 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1025,6 +1025,10 @@ max_addresses - INTEGER autoconfigured addresses. Default: 16 +disable_ipv6 - BOOLEAN + Disable IPv6 operation. + Default: FALSE (enable IPv6 operation) + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index cde056e08181..d9d7f9b69eb4 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -163,6 +163,7 @@ struct ipv6_devconf { #ifdef CONFIG_IPV6_MROUTE __s32 mc_forwarding; #endif + __s32 disable_ipv6; void *sysctl; }; @@ -194,6 +195,7 @@ enum { DEVCONF_OPTIMISTIC_DAD, DEVCONF_ACCEPT_SOURCE_ROUTE, DEVCONF_MC_FORWARDING, + DEVCONF_DISABLE_IPV6, DEVCONF_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 8b6875f02039..8c5cff50bbed 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -183,6 +183,7 @@ struct ipv6_devconf ipv6_devconf __read_mostly = { #endif .proxy_ndp = 0, .accept_source_route = 0, /* we do not accept RH0 by default. */ + .disable_ipv6 = 0, }; static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { @@ -215,6 +216,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { #endif .proxy_ndp = 0, .accept_source_route = 0, /* we do not accept RH0 by default. */ + .disable_ipv6 = 0, }; /* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ @@ -3657,6 +3659,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, #ifdef CONFIG_IPV6_MROUTE array[DEVCONF_MC_FORWARDING] = cnf->mc_forwarding; #endif + array[DEVCONF_DISABLE_IPV6] = cnf->disable_ipv6; } static inline size_t inet6_if_nlmsg_size(void) @@ -4215,6 +4218,14 @@ static struct addrconf_sysctl_table .proc_handler = &proc_dointvec, }, #endif + { + .ctl_name = CTL_UNNUMBERED, + .procname = "disable_ipv6", + .data = &ipv6_devconf.disable_ipv6, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0, /* sentinel */ } diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 34e5a96623ae..ea81c614dde2 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -71,7 +71,8 @@ int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt IP6_INC_STATS_BH(idev, IPSTATS_MIB_INRECEIVES); - if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) { + if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL || + !idev || unlikely(idev->cnf.disable_ipv6)) { IP6_INC_STATS_BH(idev, IPSTATS_MIB_INDISCARDS); rcu_read_unlock(); goto out; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 871bdec09edb..0981c1ef3057 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -173,6 +173,13 @@ static inline int ip6_skb_dst_mtu(struct sk_buff *skb) int ip6_output(struct sk_buff *skb) { + struct inet6_dev *idev = ip6_dst_idev(skb->dst); + if (unlikely(idev->cnf.disable_ipv6)) { + IP6_INC_STATS(idev, IPSTATS_MIB_OUTDISCARDS); + kfree_skb(skb); + return 0; + } + if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) || dst_allfrag(skb->dst)) return ip6_fragment(skb, ip6_output2); -- cgit v1.2.3-58-ga151 From 1b34be74cbf18f5d58cc85c7c4afcd9f7d74accd Mon Sep 17 00:00:00 2001 From: YOSHIFUJI Hideaki Date: Sat, 28 Jun 2008 14:18:38 +0900 Subject: ipv6 addrconf: add accept_dad sysctl to control DAD operation. - If 0, disable DAD. - If 1, perform DAD (default). - If >1, perform DAD and disable IPv6 operation if DAD for MAC-based link-local address has been failed (RFC4862 5.4.5). We do not follow RFC4862 by default. Refer to the netdev thread entitled "Linux IPv6 DAD not full conform to RFC 4862 ?" http://www.spinics.net/lists/netdev/msg52027.html Signed-off-by: YOSHIFUJI Hideaki --- Documentation/networking/ip-sysctl.txt | 7 +++++++ include/linux/ipv6.h | 2 ++ net/ipv6/addrconf.c | 35 ++++++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index dae980e8f1b9..72f6d52e52e6 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1029,6 +1029,13 @@ disable_ipv6 - BOOLEAN Disable IPv6 operation. Default: FALSE (enable IPv6 operation) +accept_dad - INTEGER + Whether to accept DAD (Duplicate Address Detection). + 0: Disable DAD + 1: Enable DAD (default) + 2: Enable DAD, and disable IPv6 operation if MAC-based duplicate + link-local address has been found. + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index d9d7f9b69eb4..391ad0843a46 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -164,6 +164,7 @@ struct ipv6_devconf { __s32 mc_forwarding; #endif __s32 disable_ipv6; + __s32 accept_dad; void *sysctl; }; @@ -196,6 +197,7 @@ enum { DEVCONF_ACCEPT_SOURCE_ROUTE, DEVCONF_MC_FORWARDING, DEVCONF_DISABLE_IPV6, + DEVCONF_ACCEPT_DAD, DEVCONF_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 8c5cff50bbed..2ec73e62202c 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -119,6 +119,7 @@ static void ipv6_regen_rndid(unsigned long data); static int desync_factor = MAX_DESYNC_FACTOR * HZ; #endif +static int ipv6_generate_eui64(u8 *eui, struct net_device *dev); static int ipv6_count_addresses(struct inet6_dev *idev); /* @@ -184,6 +185,7 @@ struct ipv6_devconf ipv6_devconf __read_mostly = { .proxy_ndp = 0, .accept_source_route = 0, /* we do not accept RH0 by default. */ .disable_ipv6 = 0, + .accept_dad = 1, }; static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { @@ -217,6 +219,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .proxy_ndp = 0, .accept_source_route = 0, /* we do not accept RH0 by default. */ .disable_ipv6 = 0, + .accept_dad = 1, }; /* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ @@ -380,6 +383,9 @@ static struct inet6_dev * ipv6_add_dev(struct net_device *dev) */ in6_dev_hold(ndev); + if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) + ndev->cnf.accept_dad = -1; + #if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE) if (dev->type == ARPHRD_SIT && (dev->priv_flags & IFF_ISATAP)) { printk(KERN_INFO @@ -1421,6 +1427,20 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp) void addrconf_dad_failure(struct inet6_ifaddr *ifp) { + struct inet6_dev *idev = ifp->idev; + if (idev->cnf.accept_dad > 1 && !idev->cnf.disable_ipv6) { + struct in6_addr addr; + + addr.s6_addr32[0] = htonl(0xfe800000); + addr.s6_addr32[1] = 0; + + if (!ipv6_generate_eui64(addr.s6_addr + 8, idev->dev) && + ipv6_addr_equal(&ifp->addr, &addr)) { + /* DAD failed for link-local based on MAC address */ + idev->cnf.disable_ipv6 = 1; + } + } + if (net_ratelimit()) printk(KERN_INFO "%s: duplicate address detected!\n", ifp->idev->dev->name); addrconf_dad_stop(ifp); @@ -2753,6 +2773,7 @@ static void addrconf_dad_start(struct inet6_ifaddr *ifp, u32 flags) spin_lock_bh(&ifp->lock); if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) || + idev->cnf.accept_dad < 1 || !(ifp->flags&IFA_F_TENTATIVE) || ifp->flags & IFA_F_NODAD) { ifp->flags &= ~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC); @@ -2800,6 +2821,11 @@ static void addrconf_dad_timer(unsigned long data) read_unlock_bh(&idev->lock); goto out; } + if (idev->cnf.accept_dad > 1 && idev->cnf.disable_ipv6) { + read_unlock_bh(&idev->lock); + addrconf_dad_failure(ifp); + return; + } spin_lock_bh(&ifp->lock); if (ifp->probes == 0) { /* @@ -3660,6 +3686,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, array[DEVCONF_MC_FORWARDING] = cnf->mc_forwarding; #endif array[DEVCONF_DISABLE_IPV6] = cnf->disable_ipv6; + array[DEVCONF_ACCEPT_DAD] = cnf->accept_dad; } static inline size_t inet6_if_nlmsg_size(void) @@ -4226,6 +4253,14 @@ static struct addrconf_sysctl_table .mode = 0644, .proc_handler = &proc_dointvec, }, + { + .ctl_name = CTL_UNNUMBERED, + .procname = "accept_dad", + .data = &ipv6_devconf.accept_dad, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, { .ctl_name = 0, /* sentinel */ } -- cgit v1.2.3-58-ga151 From 6a3d8aa48c1c9d3afc761b862267b9945cc6f281 Mon Sep 17 00:00:00 2001 From: Francois Romieu Date: Sun, 6 Jul 2008 20:50:16 -0700 Subject: netdev: remove unused S2IO_NAPI Signed-off-by: Francois Romieu Acked-by: Jeff Garzik Signed-off-by: David S. Miller --- Documentation/networking/s2io.txt | 7 ++----- drivers/net/Kconfig | 14 -------------- 2 files changed, 2 insertions(+), 19 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/s2io.txt index 1e28e2ddb90a..c3d6b4d5d014 100644 --- a/Documentation/networking/s2io.txt +++ b/Documentation/networking/s2io.txt @@ -52,13 +52,10 @@ d. MSI/MSI-X. Can be enabled on platforms which support this feature (IA64, Xeon) resulting in noticeable performance improvement(upto 7% on certain platforms). -e. NAPI. Compile-time option(CONFIG_S2IO_NAPI) for better Rx interrupt -moderation. - -f. Statistics. Comprehensive MAC-level and software statistics displayed +e. Statistics. Comprehensive MAC-level and software statistics displayed using "ethtool -S" option. -g. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings, +f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings, with multiple steering options. 4. Command line parameters diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 84925915dce4..1fe47a3bb6f5 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2515,20 +2515,6 @@ config S2IO More specific information on configuring the driver is in . -config S2IO_NAPI - bool "Use Rx Polling (NAPI) (EXPERIMENTAL)" - depends on S2IO && EXPERIMENTAL - help - NAPI is a new driver API designed to reduce CPU and interrupt load - when the driver is receiving lots of packets from the card. It is - still somewhat experimental and thus not yet enabled by default. - - If your estimated Rx load is 10kpps or more, or if the card will be - deployed on potentially unfriendly networks (e.g. in a firewall), - then say Y here. - - If in doubt, say N. - config MYRI10GE tristate "Myricom Myri-10G Ethernet support" depends on PCI && INET -- cgit v1.2.3-58-ga151 From b19fa1fa91845234961c64dbd564671aa7c0fd27 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Tue, 8 Jul 2008 23:14:24 -0700 Subject: net: Delete NETDEVICES_MULTIQUEUE kconfig option. Multiple TX queue support is a core networking feature. Signed-off-by: David S. Miller --- Documentation/networking/multiqueue.txt | 79 +-------------------------------- drivers/net/Kconfig | 8 ---- drivers/net/cpmac.c | 14 ------ drivers/net/ixgbe/ixgbe_ethtool.c | 6 --- drivers/net/ixgbe/ixgbe_main.c | 40 ----------------- drivers/net/s2io.c | 47 ++++---------------- include/linux/netdevice.h | 14 ------ include/linux/skbuff.h | 10 ----- net/mac80211/Kconfig | 3 -- 9 files changed, 9 insertions(+), 212 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt index ea5a42e8f79f..e6dc1ee9e8f1 100644 --- a/Documentation/networking/multiqueue.txt +++ b/Documentation/networking/multiqueue.txt @@ -3,19 +3,11 @@ =========================================== Section 1: Base driver requirements for implementing multiqueue support -Section 2: Qdisc support for multiqueue devices -Section 3: Brief howto using PRIO or RR for multiqueue devices - Intro: Kernel support for multiqueue devices --------------------------------------------------------- -Kernel support for multiqueue devices is only an API that is presented to the -netdevice layer for base drivers to implement. This feature is part of the -core networking stack, and all network devices will be running on the -multiqueue-aware stack. If a base driver only has one queue, then these -changes are transparent to that driver. - +Kernel support for multiqueue devices is always present. Section 1: Base driver requirements for implementing multiqueue support ----------------------------------------------------------------------- @@ -43,73 +35,4 @@ bitmap on device initialization. Below is an example from e1000: netdev->features |= NETIF_F_MULTI_QUEUE; #endif - -Section 2: Qdisc support for multiqueue devices ------------------------------------------------ - -Currently two qdiscs support multiqueue devices. A new round-robin qdisc, -sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to -bands and queues, and will store the queue mapping into skb->queue_mapping. -Use this field in the base driver to determine which queue to send the skb -to. - -sch_rr has been added for hardware that doesn't want scheduling policies from -software, so it's a straight round-robin qdisc. It uses the same syntax and -classification priomap that sch_prio uses, so it should be intuitive to -configure for people who've used sch_prio. - -In order to utilitize the multiqueue features of the qdiscs, the network -device layer needs to enable multiple queue support. This can be done by -selecting NETDEVICES_MULTIQUEUE under Drivers. - -The PRIO qdisc naturally plugs into a multiqueue device. If -NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of -bands requested is compared to the number of queues on the hardware. If they -are equal, it sets a one-to-one mapping up between the queues and bands. If -they're not equal, it will not load the qdisc. This is the same behavior -for RR. Once the association is made, any skb that is classified will have -skb->queue_mapping set, which will allow the driver to properly queue skb's -to multiple queues. - - -Section 3: Brief howto using PRIO and RR for multiqueue devices ---------------------------------------------------------------- - -The userspace command 'tc,' part of the iproute2 package, is used to configure -qdiscs. To add the PRIO qdisc to your network device, assuming the device is -called eth0, run the following command: - -# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue - -This will create 4 bands, 0 being highest priority, and associate those bands -to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping -would look like: - -band 0 => queue 0 -band 1 => queue 1 -band 2 => queue 2 -band 3 => queue 3 - -Traffic will begin flowing through each queue if your TOS values are assigning -traffic across the various bands. For example, ssh traffic will always try to -go out band 0 based on TOS -> Linux priority conversion (realtime traffic), -so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" -traffic classification, which is band 1. Therefore pings will be send out -queue 1 on the NIC. - -Note the use of the multiqueue keyword. This is only in versions of iproute2 -that support multiqueue networking devices; if this is omitted when loading -a qdisc onto a multiqueue device, the qdisc will load and operate the same -if it were loaded onto a single-queue device (i.e. - sends all traffic to -queue 0). - -Another alternative to multiqueue band allocation can be done by using the -multiqueue option and specify 0 bands. If this is the case, the qdisc will -allocate the number of bands to equal the number of queues that the device -reports, and bring the qdisc online. - -The behavior of tc filters remains the same, where it will override TOS priority -classification. - - Author: Peter P. Waskiewicz Jr. diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index ef733abc857d..4675c1bd6fb9 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -26,14 +26,6 @@ menuconfig NETDEVICES # that for each of the symbols. if NETDEVICES -config NETDEVICES_MULTIQUEUE - bool "Netdevice multiple hardware queue support" - ---help--- - Say Y here if you want to allow the network stack to use multiple - hardware TX queues on an ethernet device. - - Most people will say N here. - config IFB tristate "Intermediate Functional Block support" depends on NET_CLS_ACT diff --git a/drivers/net/cpmac.c b/drivers/net/cpmac.c index 7f3f62e1b113..d630e2a72f42 100644 --- a/drivers/net/cpmac.c +++ b/drivers/net/cpmac.c @@ -569,11 +569,7 @@ static int cpmac_start_xmit(struct sk_buff *skb, struct net_device *dev) len = max(skb->len, ETH_ZLEN); queue = skb_get_queue_mapping(skb); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(dev, queue); -#else - netif_stop_queue(dev); -#endif desc = &priv->desc_ring[queue]; if (unlikely(desc->dataflags & CPMAC_OWN)) { @@ -626,24 +622,14 @@ static void cpmac_end_xmit(struct net_device *dev, int queue) dev_kfree_skb_irq(desc->skb); desc->skb = NULL; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (netif_subqueue_stopped(dev, queue)) netif_wake_subqueue(dev, queue); -#else - if (netif_queue_stopped(dev)) - netif_wake_queue(dev); -#endif } else { if (netif_msg_tx_err(priv) && net_ratelimit()) printk(KERN_WARNING "%s: end_xmit: spurious interrupt\n", dev->name); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (netif_subqueue_stopped(dev, queue)) netif_wake_subqueue(dev, queue); -#else - if (netif_queue_stopped(dev)) - netif_wake_queue(dev); -#endif } } diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c index 12990b1fe7e4..81b769093d22 100644 --- a/drivers/net/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ixgbe/ixgbe_ethtool.c @@ -252,21 +252,15 @@ static int ixgbe_set_tso(struct net_device *netdev, u32 data) netdev->features |= NETIF_F_TSO; netdev->features |= NETIF_F_TSO6; } else { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE struct ixgbe_adapter *adapter = netdev_priv(netdev); int i; -#endif netif_stop_queue(netdev); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_stop_subqueue(netdev, i); -#endif netdev->features &= ~NETIF_F_TSO; netdev->features &= ~NETIF_F_TSO6; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_start_subqueue(netdev, i); -#endif netif_start_queue(netdev); } return 0; diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index b37d618d8e2a..10a1c8c5cda1 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -266,28 +266,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_adapter *adapter, * sees the new next_to_clean. */ smp_mb(); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (__netif_subqueue_stopped(netdev, tx_ring->queue_index) && !test_bit(__IXGBE_DOWN, &adapter->state)) { netif_wake_subqueue(netdev, tx_ring->queue_index); adapter->restart_queue++; } -#else - if (netif_queue_stopped(netdev) && - !test_bit(__IXGBE_DOWN, &adapter->state)) { - netif_wake_queue(netdev); - adapter->restart_queue++; - } -#endif } if (adapter->detect_tx_hung) if (ixgbe_check_tx_hang(adapter, tx_ring, eop, eop_desc)) -#ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(netdev, tx_ring->queue_index); -#else - netif_stop_queue(netdev); -#endif if (total_tx_packets >= tx_ring->work_limit) IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, tx_ring->eims_value); @@ -2192,11 +2180,7 @@ static void __devinit ixgbe_set_num_queues(struct ixgbe_adapter *adapter) case (IXGBE_FLAG_RSS_ENABLED): rss_m = 0xF; nrq = rss_i; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE ntq = rss_i; -#else - ntq = 1; -#endif break; case 0: default: @@ -2370,10 +2354,8 @@ try_msi: } out: -#ifdef CONFIG_NETDEVICES_MULTIQUEUE /* Notify the stack of the (possibly) reduced Tx Queue count. */ adapter->netdev->egress_subqueue_count = adapter->num_tx_queues; -#endif return err; } @@ -2910,9 +2892,7 @@ static void ixgbe_watchdog(unsigned long data) struct net_device *netdev = adapter->netdev; bool link_up; u32 link_speed = 0; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE int i; -#endif adapter->hw.mac.ops.check_link(&adapter->hw, &(link_speed), &link_up); @@ -2934,10 +2914,8 @@ static void ixgbe_watchdog(unsigned long data) netif_carrier_on(netdev); netif_wake_queue(netdev); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_wake_subqueue(netdev, i); -#endif } else { /* Force detection of hung controller */ adapter->detect_tx_hung = true; @@ -3264,11 +3242,7 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev, { struct ixgbe_adapter *adapter = netdev_priv(netdev); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(netdev, tx_ring->queue_index); -#else - netif_stop_queue(netdev); -#endif /* Herbert's original patch had: * smp_mb__after_netif_stop_queue(); * but since that doesn't exist yet, just open code it. */ @@ -3280,11 +3254,7 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev, return -EBUSY; /* A reprieve! - use start_queue because it doesn't call schedule */ -#ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_wake_subqueue(netdev, tx_ring->queue_index); -#else - netif_wake_queue(netdev); -#endif ++adapter->restart_queue; return 0; } @@ -3312,9 +3282,7 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) unsigned int f; unsigned int nr_frags = skb_shinfo(skb)->nr_frags; len -= skb->data_len; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE r_idx = (adapter->num_tx_queues - 1) & skb->queue_mapping; -#endif tx_ring = &adapter->tx_ring[r_idx]; @@ -3502,11 +3470,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, pci_set_master(pdev); pci_save_state(pdev); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE netdev = alloc_etherdev_mq(sizeof(struct ixgbe_adapter), MAX_TX_QUEUES); -#else - netdev = alloc_etherdev(sizeof(struct ixgbe_adapter)); -#endif if (!netdev) { err = -ENOMEM; goto err_alloc_etherdev; @@ -3598,9 +3562,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, if (pci_using_dac) netdev->features |= NETIF_F_HIGHDMA; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE netdev->features |= NETIF_F_MULTI_QUEUE; -#endif /* make sure the EEPROM is good */ if (ixgbe_validate_eeprom_checksum(hw, NULL) < 0) { @@ -3668,10 +3630,8 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, netif_carrier_off(netdev); netif_stop_queue(netdev); -#ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_stop_subqueue(netdev, i); -#endif ixgbe_napi_add_all(adapter); diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c index e7a3dbec674c..51a91154125d 100644 --- a/drivers/net/s2io.c +++ b/drivers/net/s2io.c @@ -546,13 +546,10 @@ static struct pci_driver s2io_driver = { static inline void s2io_stop_all_tx_queue(struct s2io_nic *sp) { int i; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (sp->config.multiq) { for (i = 0; i < sp->config.tx_fifo_num; i++) netif_stop_subqueue(sp->dev, i); - } else -#endif - { + } else { for (i = 0; i < sp->config.tx_fifo_num; i++) sp->mac_control.fifos[i].queue_state = FIFO_QUEUE_STOP; netif_stop_queue(sp->dev); @@ -561,12 +558,9 @@ static inline void s2io_stop_all_tx_queue(struct s2io_nic *sp) static inline void s2io_stop_tx_queue(struct s2io_nic *sp, int fifo_no) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (sp->config.multiq) netif_stop_subqueue(sp->dev, fifo_no); - else -#endif - { + else { sp->mac_control.fifos[fifo_no].queue_state = FIFO_QUEUE_STOP; netif_stop_queue(sp->dev); @@ -576,13 +570,10 @@ static inline void s2io_stop_tx_queue(struct s2io_nic *sp, int fifo_no) static inline void s2io_start_all_tx_queue(struct s2io_nic *sp) { int i; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (sp->config.multiq) { for (i = 0; i < sp->config.tx_fifo_num; i++) netif_start_subqueue(sp->dev, i); - } else -#endif - { + } else { for (i = 0; i < sp->config.tx_fifo_num; i++) sp->mac_control.fifos[i].queue_state = FIFO_QUEUE_START; netif_start_queue(sp->dev); @@ -591,12 +582,9 @@ static inline void s2io_start_all_tx_queue(struct s2io_nic *sp) static inline void s2io_start_tx_queue(struct s2io_nic *sp, int fifo_no) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (sp->config.multiq) netif_start_subqueue(sp->dev, fifo_no); - else -#endif - { + else { sp->mac_control.fifos[fifo_no].queue_state = FIFO_QUEUE_START; netif_start_queue(sp->dev); @@ -606,13 +594,10 @@ static inline void s2io_start_tx_queue(struct s2io_nic *sp, int fifo_no) static inline void s2io_wake_all_tx_queue(struct s2io_nic *sp) { int i; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (sp->config.multiq) { for (i = 0; i < sp->config.tx_fifo_num; i++) netif_wake_subqueue(sp->dev, i); - } else -#endif - { + } else { for (i = 0; i < sp->config.tx_fifo_num; i++) sp->mac_control.fifos[i].queue_state = FIFO_QUEUE_START; netif_wake_queue(sp->dev); @@ -623,13 +608,10 @@ static inline void s2io_wake_tx_queue( struct fifo_info *fifo, int cnt, u8 multiq) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (multiq) { if (cnt && __netif_subqueue_stopped(fifo->dev, fifo->fifo_no)) netif_wake_subqueue(fifo->dev, fifo->fifo_no); - } else -#endif - if (cnt && (fifo->queue_state == FIFO_QUEUE_STOP)) { + } else if (cnt && (fifo->queue_state == FIFO_QUEUE_STOP)) { if (netif_queue_stopped(fifo->dev)) { fifo->queue_state = FIFO_QUEUE_START; netif_wake_queue(fifo->dev); @@ -4189,15 +4171,12 @@ static int s2io_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_LOCKED; } -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (sp->config.multiq) { if (__netif_subqueue_stopped(dev, fifo->fifo_no)) { spin_unlock_irqrestore(&fifo->tx_lock, flags); return NETDEV_TX_BUSY; } - } else -#endif - if (unlikely(fifo->queue_state == FIFO_QUEUE_STOP)) { + } else if (unlikely(fifo->queue_state == FIFO_QUEUE_STOP)) { if (netif_queue_stopped(dev)) { spin_unlock_irqrestore(&fifo->tx_lock, flags); return NETDEV_TX_BUSY; @@ -7633,12 +7612,6 @@ static int s2io_verify_parm(struct pci_dev *pdev, u8 *dev_intr_type, DBG_PRINT(ERR_DBG, "tx fifos\n"); } -#ifndef CONFIG_NETDEVICES_MULTIQUEUE - if (multiq) { - DBG_PRINT(ERR_DBG, "s2io: Multiqueue support not enabled\n"); - multiq = 0; - } -#endif if (multiq) *dev_multiq = multiq; @@ -7783,12 +7756,10 @@ s2io_init_nic(struct pci_dev *pdev, const struct pci_device_id *pre) pci_disable_device(pdev); return -ENODEV; } -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (dev_multiq) dev = alloc_etherdev_mq(sizeof(struct s2io_nic), tx_fifo_num); else -#endif - dev = alloc_etherdev(sizeof(struct s2io_nic)); + dev = alloc_etherdev(sizeof(struct s2io_nic)); if (dev == NULL) { DBG_PRINT(ERR_DBG, "Device allocation failed\n"); pci_disable_device(pdev); @@ -7979,10 +7950,8 @@ s2io_init_nic(struct pci_dev *pdev, const struct pci_device_id *pre) dev->features |= NETIF_F_UFO; dev->features |= NETIF_F_HW_CSUM; } -#ifdef CONFIG_NETDEVICES_MULTIQUEUE if (config->multiq) dev->features |= NETIF_F_MULTI_QUEUE; -#endif dev->tx_timeout = &s2io_tx_watchdog; dev->watchdog_timeo = WATCH_DOG_TIMEOUT; INIT_WORK(&sp->rst_timer_task, s2io_restart_nic); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c8d5f128858d..e2d931f9b700 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1043,9 +1043,7 @@ static inline int netif_running(const struct net_device *dev) */ static inline void netif_start_subqueue(struct net_device *dev, u16 queue_index) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE clear_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state); -#endif } /** @@ -1057,13 +1055,11 @@ static inline void netif_start_subqueue(struct net_device *dev, u16 queue_index) */ static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE #ifdef CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; #endif set_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state); -#endif } /** @@ -1076,12 +1072,8 @@ static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index) static inline int __netif_subqueue_stopped(const struct net_device *dev, u16 queue_index) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE return test_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state); -#else - return 0; -#endif } static inline int netif_subqueue_stopped(const struct net_device *dev, @@ -1099,7 +1091,6 @@ static inline int netif_subqueue_stopped(const struct net_device *dev, */ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE #ifdef CONFIG_NETPOLL_TRAP if (netpoll_trap()) return; @@ -1107,7 +1098,6 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index) if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state)) __netif_schedule(&dev->tx_queue); -#endif } /** @@ -1119,11 +1109,7 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index) */ static inline int netif_is_multiqueue(const struct net_device *dev) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE return (!!(NETIF_F_MULTI_QUEUE & dev->features)); -#else - return 0; -#endif } /* Use this variant when it is known for sure that it diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2220b9e2dab0..8f10e3d08fd9 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -305,9 +305,7 @@ struct sk_buff { #endif int iif; -#ifdef CONFIG_NETDEVICES_MULTIQUEUE __u16 queue_mapping; -#endif #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ #ifdef CONFIG_NET_CLS_ACT @@ -1671,25 +1669,17 @@ static inline void skb_init_secmark(struct sk_buff *skb) static inline void skb_set_queue_mapping(struct sk_buff *skb, u16 queue_mapping) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE skb->queue_mapping = queue_mapping; -#endif } static inline u16 skb_get_queue_mapping(struct sk_buff *skb) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE return skb->queue_mapping; -#else - return 0; -#endif } static inline void skb_copy_queue_mapping(struct sk_buff *to, const struct sk_buff *from) { -#ifdef CONFIG_NETDEVICES_MULTIQUEUE to->queue_mapping = from->queue_mapping; -#endif } static inline int skb_is_gso(const struct sk_buff *skb) diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig index 40f1add17753..d2038418e2bd 100644 --- a/net/mac80211/Kconfig +++ b/net/mac80211/Kconfig @@ -15,14 +15,11 @@ config MAC80211_QOS def_bool y depends on MAC80211 depends on NET_SCHED - depends on NETDEVICES_MULTIQUEUE comment "QoS/HT support disabled" depends on MAC80211 && !MAC80211_QOS comment "QoS/HT support needs CONFIG_NET_SCHED" depends on MAC80211 && !NET_SCHED -comment "QoS/HT support needs CONFIG_NETDEVICES_MULTIQUEUE" - depends on MAC80211 && !NETDEVICES_MULTIQUEUE menu "Rate control algorithm selection" depends on MAC80211 != n -- cgit v1.2.3-58-ga151 From 2115a6432911b669bec037686066c7bbc70cc68e Mon Sep 17 00:00:00 2001 From: Jesse Brandeburg Date: Tue, 8 Jul 2008 15:51:57 -0700 Subject: ixgb: update readme text Signed-off-by: Jesse Brandeburg Signed-off-by: Jeff Kirsher Signed-off-by: Jeff Garzik --- Documentation/networking/ixgb.txt | 419 +++++++++++++++++++++++++++++--------- 1 file changed, 320 insertions(+), 99 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt index 7c98277777eb..a0d0ffb5e584 100644 --- a/Documentation/networking/ixgb.txt +++ b/Documentation/networking/ixgb.txt @@ -1,7 +1,7 @@ -Linux* Base Driver for the Intel(R) PRO/10GbE Family of Adapters -================================================================ +Linux Base Driver for 10 Gigabit Intel(R) Network Connection +============================================================= -November 17, 2004 +October 9, 2007 Contents @@ -9,94 +9,151 @@ Contents - In This Release - Identifying Your Adapter +- Building and Installation - Command Line Parameters - Improving Performance +- Additional Configurations +- Known Issues/Troubleshooting - Support + In This Release =============== -This file describes the Linux* Base Driver for the Intel(R) PRO/10GbE Family -of Adapters, version 1.0.x. +This file describes the ixgb Linux Base Driver for the 10 Gigabit Intel(R) +Network Connection. This driver includes support for Itanium(R)2-based +systems. + +For questions related to hardware requirements, refer to the documentation +supplied with your 10 Gigabit adapter. All hardware requirements listed apply +to use with Linux. + +The following features are available in this kernel: + - Native VLANs + - Channel Bonding (teaming) + - SNMP + +Channel Bonding documentation can be found in the Linux kernel source: +/Documentation/networking/bonding.txt + +The driver information previously displayed in the /proc filesystem is not +supported in this release. Alternatively, you can use ethtool (version 1.6 +or later), lspci, and ifconfig to obtain the same information. + +Instructions on updating ethtool can be found in the section "Additional +Configurations" later in this document. -For questions related to hardware requirements, refer to the documentation -supplied with your Intel PRO/10GbE adapter. All hardware requirements listed -apply to use with Linux. Identifying Your Adapter ======================== -To verify your Intel adapter is supported, find the board ID number on the -adapter. Look for a label that has a barcode and a number in the format -A12345-001. +The following Intel network adapters are compatible with the drivers in this +release: + +Controller Adapter Name Physical Layer +---------- ------------ -------------- +82597EX Intel(R) PRO/10GbE LR/SR/CX4 10G Base-LR (1310 nm optical fiber) + Server Adapters 10G Base-SR (850 nm optical fiber) + 10G Base-CX4(twin-axial copper cabling) + +For more information on how to identify your adapter, go to the Adapter & +Driver ID Guide at: + + http://support.intel.com/support/network/sb/CS-012904.htm + + +Building and Installation +========================= + +select m for "Intel(R) PRO/10GbE support" located at: + Location: + -> Device Drivers + -> Network device support (NETDEVICES [=y]) + -> Ethernet (10000 Mbit) (NETDEV_10000 [=y]) +1. make modules && make modules_install + +2. Load the module: + +    modprobe ixgb = + + The insmod command can be used if the full + path to the driver module is specified. For example: + + insmod /lib/modules//kernel/drivers/net/ixgb/ixgb.ko + + With 2.6 based kernels also make sure that older ixgb drivers are + removed from the kernel, before loading the new module: -Use the above information and the Adapter & Driver ID Guide at: + rmmod ixgb; modprobe ixgb - http://support.intel.com/support/network/adapter/pro100/21397.htm +3. Assign an IP address to the interface by entering the following, where + x is the interface number: -For the latest Intel network drivers for Linux, go to: + ifconfig ethx + +4. Verify that the interface works. Enter the following, where + is the IP address for another machine on the same subnet as the interface + that is being tested: + + ping - http://downloadfinder.intel.com/scripts-df/support_intel.asp Command Line Parameters ======================= -If the driver is built as a module, the following optional parameters are -used by entering them on the command line with the modprobe or insmod command -using this syntax: +If the driver is built as a module, the following optional parameters are +used by entering them on the command line with the modprobe command using +this syntax: modprobe ixgb [