summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Platform regulatory domain support for ath10k, from Bartosz Markowski. 2) Centralize min/max MTU checking, thus removing tons of duplicated code all of the the various drivers. From Jarod Wilson. 3) Support ingress actions in act_mirred, from Shmulik Ladkani. 4) Improve device adjacency tracking, from David Ahern. 5) Add support for LED triggers on PHY link state changes, from Zach Brown. 6) Improve UDP socket memory accounting, from Paolo Abeni. 7) Set SK_MEM_QUANTUM to a fixed size of 4096, instead of PAGE_SIZE. From Eric Dumazet. 8) Collapse TCP SKBs at retransmit time even if the right side SKB has frags. Also from Eric Dumazet. 9) Add IP_RECVFRAGSIZE and IPV6_RECVFRAGSIZE cmsgs, from Willem de Bruijn. 10) Support routing by UID, from Lorenzo Colitti. 11) Handle L3 domain binding (ie. VRF) for RAW sockets, from David Ahern. 12) tcp_get_info() can run lockless, from Eric Dumazet. 13) 4-tuple UDP hashing in SFC driver, from Edward Cree. 14) Avoid reorders in GRO code, from Eric Dumazet. 15) IPV6 Segment Routing support, from David Lebrun. 16) Support MPLS push and pop for L3 packets in openvswitch, from Jiri Benc. 17) Add LRU datastructure support for BPF, Martin KaFai Lau. 18) VF support in liquidio driver, from Raghu Vatsavayi. 19) Multiqueue support in alx driver, from Tobias Regnery. 20) Networking cgroup BPF support, from Daniel Mack. 21) TCP chronograph measurements, from Francis Yan. 22) XDP support for qed driver, from Yuval Mintz. 23) BPF based lwtunnels, from Thomas Graf. 24) Consistent FIB dumping to offloading drivers, from Ido Schimmel. 25) Many optimizations for UDP under high load, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) netfilter: nft_counter: rework atomic dump and reset e1000: use disable_hardirq() for e1000_netpoll() i40e: don't truncate match_method assignment net: ethernet: ti: netcp: add support of cpts net: phy: phy drivers should not set SUPPORTED_[Asym_]Pause net: l2tp: ppp: change PPPOL2TP_MSG_* => L2TP_MSG_* net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_* net: l2tp: export debug flags to UAPI net: ethernet: stmmac: remove private tx queue lock net: ethernet: sxgbe: remove private tx queue lock net: bridge: shorten ageing time on topology change net: bridge: add helper to set topology change net: bridge: add helper to offload ageing time net: nicvf: use new api ethtool_{get|set}_link_ksettings net: ethernet: ti: cpsw: sync rates for channels in dual emac mode net: ethernet: ti: cpsw: re-split res only when speed is changed net: ethernet: ti: cpsw: combine budget and weight split and check net: ethernet: ti: cpsw: don't start queue twice net: ethernet: ti: cpsw: use same macros to get active slave net: mvneta: select GENERIC_ALLOCATOR ...
2016-12-11Linux 4.9v4.9Linus Torvalds
2016-12-11Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS fixes from Ralf Baechle: "Two more MIPS fixes for 4.9: - RTC: Return -ENODEV so an external RTC will be tried - Fix mask of GPE frequency These two have been tested on Imagination's automated test system and also both received positive reviews on the linux-mips mailing list" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Lantiq: Fix mask of GPE frequency MIPS: Return -ENODEV from weak implementation of rtc_mips_set_time
2016-12-11netfilter: nft_counter: rework atomic dump and resetPablo Neira
Dump and reset doesn't work unless cmpxchg64() is used both from packet and control plane paths. This approach is going to be slow though. Instead, use a percpu seqcount to fetch counters consistently, then subtract bytes and packets in case a reset was requested. The cpu that running over the reset code is guaranteed to own this stats exclusively, we have to turn counters into signed 64bit though so stats update on reset don't get wrong on underflow. This patch is based on original sketch from Eric Dumazet. Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-11MIPS: Lantiq: Fix mask of GPE frequencyHauke Mehrtens
The hardware documentation says bit 11:10 are used for the GPE frequency selection. Fix the mask in the define to match these bits. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Thomas Langer <thomas.langer@intel.com> Cc: linux-mips@linux-mips.org Cc: john@phrozen.org Patchwork: https://patchwork.linux-mips.org/patch/14648/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-12-11MIPS: Return -ENODEV from weak implementation of rtc_mips_set_timeLuuk Paulussen
The sync_cmos_clock function in kernel/time/ntp.c first tries to update the internal clock of the cpu by calling the "update_persistent_clock64" architecture specific function. If this returns -ENODEV, it then tries to update an external RTC using "rtc_set_ntp_time". On the mips architecture, the weak implementation of the underlying function would return 0 if it wasn't overridden. This meant that the sync_cmos_clock function would never try to update an external RTC (if both CONFIG_GENERIC_CMOS_UPDATE and CONFIG_RTC_SYSTOHC are configured) Returning -ENODEV instead, means that an external RTC will be tried. Signed-off-by: Luuk Paulussen <luuk.paulussen@alliedtelesis.co.nz> Reviewed-by: Richard Laing <richard.laing@alliedtelesis.co.nz> Reviewed-by: Scott Parlane <scott.parlane@alliedtelesis.co.nz> Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/14649/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-12-10e1000: use disable_hardirq() for e1000_netpoll()WANG Cong
In commit 02cea3958664 ("genirq: Provide disable_hardirq()") Peter introduced disable_hardirq() for netpoll, but it is forgotten to use it for e1000. This patch changes disable_irq() to disable_hardirq() for e1000. Reported-by: Dave Jones <davej@codemonkey.org.uk> Suggested-by: Sabrina Dubroca <sd@queasysnail.net> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10i40e: don't truncate match_method assignmentKeller, Jacob E
The .match_method field is a u8, so we shouldn't be casting to a u16, and because it is only one byte, we do not need to byte swap anything. Just assign the value directly. This avoids issues on Big Endian architectures which would have byte swapped and then incorrectly truncated the value. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Bimmy Pujari <bimmy.pujari@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: ti: netcp: add support of cptsWingMan Kwok
This patch adds support of the cpts device found in the gbe and 10gbe ethernet switches on the keystone 2 SoCs (66AK2E/L/Hx, 66AK2Gx). Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: phy: phy drivers should not set SUPPORTED_[Asym_]PauseTimur Tabi
Instead of having individual PHY drivers set the SUPPORTED_Pause and SUPPORTED_Asym_Pause flags, phylib itself should set those flags, unless there is a hardware erratum or other special case. During autonegotiation, the PHYs will determine whether to enable pause frame support. Pause frames are a feature that is supported by the MAC. It is the MAC that generates the frames and that processes them. The PHY can only be configured to allow them to pass through. This commit also effectively reverts the recently applied c7a61319 ("net: phy: dp83848: Support ethernet pause frames"). So the new process is: 1) Unless the PHY driver overrides it, phylib sets the SUPPORTED_Pause and SUPPORTED_AsymPause bits in phydev->supported. This indicates that the PHY supports pause frames. 2) The MAC driver checks phydev->supported before it calls phy_start(). If (SUPPORTED_Pause | SUPPORTED_AsymPause) is set, then the MAC driver sets those bits in phydev->advertising, if it wants to enable pause frame support. 3) When the link state changes, the MAC driver checks phydev->pause and phydev->asym_pause, If the bits are set, then it enables the corresponding features in the MAC. The algorithm is: if (phydev->pause) The MAC should be programmed to receive and honor pause frames it receives, i.e. enable receive flow control. if (phydev->pause != phydev->asym_pause) The MAC should be programmed to transmit pause frames when needed, i.e. enable transmit flow control. Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: l2tp: ppp: change PPPOL2TP_MSG_* => L2TP_MSG_*Asbjørn Sloth Tønnesen
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*Asbjørn Sloth Tønnesen
PPPOL2TP_MSG_* and L2TP_MSG_* are duplicates, and are being used interchangeably in the kernel, so let's standardize on L2TP_MSG_* internally, and keep PPPOL2TP_MSG_* defined in UAPI for compatibility. Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: l2tp: export debug flags to UAPIAsbjørn Sloth Tønnesen
Move the L2TP_MSG_* definitions to UAPI, as it is part of the netlink API. Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10Merge branch 'sxgbe-stmmac-remove-private-tx-lock'David S. Miller
Lino Sanfilippo says: ==================== Remove private tx queue locks this patch series removes unnecessary private locks in the sxgbe and the stmmac driver. v2: - adjust commit message ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: stmmac: remove private tx queue lockLino Sanfilippo
The driver uses a private lock for synchronization of the xmit function and the xmit completion handler, but since the NETIF_F_LLTX flag is not set, the xmit function is also called with the xmit_lock held. On the other hand the completion handler uses the reverse locking order by first taking the private lock and (in case that the tx queue had been stopped) then the xmit_lock. Improve the locking by removing the private lock and using only the xmit_lock for synchronization instead. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: sxgbe: remove private tx queue lockLino Sanfilippo
The driver uses a private lock for synchronization of the xmit function and the xmit completion handler, but since the NETIF_F_LLTX flag is not set, the xmit function is also called with the xmit_lock held. On the other hand the completion handler uses the reverse locking order by first taking the private lock and (in case that the tx queue had been stopped) then the xmit_lock. Improve the locking by removing the private lock and using only the xmit_lock for synchronization instead. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10Merge branch 'bridge-fast-ageing-on-topology-change'David S. Miller
Vivien Didelot says: ==================== net: bridge: fast ageing on topology change 802.1D [1] specifies that the bridges in a network must use a short value to age out dynamic entries in the Filtering Database for a period, once a topology change has been communicated by the root bridge. This patchset fixes this for the in-kernel STP implementation. Once the topology change flag is set in a net_bridge instance, the ageing time value is shorten to twice the forward delay used by the topology. When the topology change flag is cleared, the ageing time configured for the bridge is restored. To accomplish that, a new bridge_ageing_time member is added to the net_bridge structure, to store the user configured bridge ageing time. Two helpers are added to offload the ageing time and set the topology change flag in the net_bridge instance. Then the required logic is added in the topology change helper if in-kernel STP is used. This has been tested on the following topology: +--------------+ | root bridge | | 1 2 3 4 | +--+--+--+--+--+ | | | | +--------+ | | | +------| laptop | | | | +--------+ +--+--+--+-----+ | 1 2 3 | | slave bridge | +--------------+ When unplugging/replugging the laptop, the slave bridge (under test) gets the topology change flag sent by the root bridge, and fast ageing is triggered on the bridges. Once the topology change timer of the root bridge expires, the topology change flag is cleared and the configured ageing time is restored on the bridges. A similar test has been done between two bridges under test. When changing the forward delay of the root bridge with: # echo 3000 > /sys/class/net/br0/bridge/forward_delay the ageing time correctly changes on both bridges from 300s to 60s while the TOPOLOGY_CHANGE flag is present. [1] "8.3.5 Notifying topology changes", http://profesores.elo.utfsm.cl/~agv/elo309/doc/802.1D-1998.pdf No change since RFC: https://lkml.org/lkml/2016/10/19/828 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: bridge: shorten ageing time on topology changeVivien Didelot
802.1D [1] specifies that the bridges must use a short value to age out dynamic entries in the Filtering Database for a period, once a topology change has been communicated by the root bridge. Add a bridge_ageing_time member in the net_bridge structure to store the bridge ageing time value configured by the user (ioctl/netlink/sysfs). If we are using in-kernel STP, shorten the ageing time value to twice the forward delay used by the topology when the topology change flag is set. When the flag is cleared, restore the configured ageing time. [1] "8.3.5 Notifying topology changes ", http://profesores.elo.utfsm.cl/~agv/elo309/doc/802.1D-1998.pdf Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: bridge: add helper to set topology changeVivien Didelot
Add a __br_set_topology_change helper to set the topology change value. This can be later extended to add actions when the topology change flag is set or cleared. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: bridge: add helper to offload ageing timeVivien Didelot
The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME switchdev attr is actually set when initializing a bridge port, and when configuring the bridge ageing time from ioctl/netlink/sysfs. Add a __set_ageing_time helper to offload the ageing time to physical switches, and add the SWITCHDEV_F_DEFER flag since it can be called under bridge lock. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: nicvf: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: ti: cpsw: sync rates for channels in dual emac modeIvan Khoronzhuk
The channels are common for both ndevs in dual emac mode. Hence, keep in sync their rates. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: ti: cpsw: re-split res only when speed is changedIvan Khoronzhuk
Don't re-split res in the following cases: - speed of phys is not changed - speed of phys is changed and no rate limited channels - speed of phys is changed and all channels are rate limited - phy is unlinked while dev is open - phy is linked back but speed is not changed The maximum speed is sum of "linked" phys, thus res are split taken in account two interfaces, both for dual emac mode and for switch mode. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: ti: cpsw: combine budget and weight split and checkIvan Khoronzhuk
Re-split weight along with budget. It simplify code a little and update state after every rate change. Also it's necessarily to move arguments checks to this combined function. Replace maximum rate check for an interface on maximum possible rate. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: ti: cpsw: don't start queue twiceIvan Khoronzhuk
No need to start queues after cpsw is started as it will be done while cpsw_adjust_link(), after phy connection. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: ethernet: ti: cpsw: use same macros to get active slaveIvan Khoronzhuk
Use the same, more convenient macros, to get active slave. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: mvneta: select GENERIC_ALLOCATORArnd Bergmann
We previously relied on GENERIC_ALLOCATOR to be selected by CONFIG_ARM, but now we can compile-test the driver on other architectures that don't select it: drivers/net/built-in.o: In function `mvneta_bm_remove': mvneta_bm.c:(.text+0x4ee35): undefined reference to `gen_pool_free' This adds an explicit select for the part of the driver that has the dependency. Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with COMPILE_TEST") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10net: socket: removed an unnecessary newlineAmit Kushwaha
This patch removes a newline which was added in socket.c file in net-next Signed-off-by: Amit Kushwaha <kushwaha.a@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10netlink: use blocking notifierWANG Cong
netlink_chain is called in ->release(), which is apparently a process context, so we don't have to use an atomic notifier here. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2016-12-10Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - Fix pointer size when caam is used with AArch64 boot loader on AArch32 kernel. - Fix ahash state corruption in marvell driver. - Fix buggy algif_aed tag handling. - Prevent mcryptd from being used with incompatible algorithms which can cause crashes" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algif_aead - fix uninitialized variable warning crypto: mcryptd - Check mcryptd algorithm compatibility crypto: algif_aead - fix AEAD tag memory handling crypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernel crypto: marvell - Don't corrupt state of an STD req for re-stepped ahash crypto: marvell - Don't copy hash operation twice into the SRAM
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Limit the number of can filters to avoid > MAX_ORDER allocations. Fix from Marc Kleine-Budde. 2) Limit GSO max size in netvsc driver to avoid problems with NVGRE configurations. From Stephen Hemminger. 3) Return proper error when memory allocation fails in ser_gigaset_init(), from Dan Carpenter. 4) Missing linkage undo in error paths of ipvlan_link_new(), from Gao Feng. 5) Missing necessayr SET_NETDEV_DEV in lantiq and cpmac drivers, from Florian Fainelli. 6) Handle probe deferral properly in smsc911x driver. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: mlx5: Fix Kconfig help text net: smsc911x: back out silently on probe deferrals ibmveth: set correct gso_size and gso_type net: ethernet: cpmac: Call SET_NETDEV_DEV() net: ethernet: lantiq_etop: Call SET_NETDEV_DEV() vhost-vsock: fix orphan connection reset cxgb4/cxgb4vf: Assign netdev->dev_port with port ID driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed ser_gigaset: return -ENOMEM on error instead of success NET: usb: cdc_mbim: add quirk for supporting Telit LE922A can: peak: fix bad memory access and free sequence phy: Don't increment MDIO bus refcount unless it's a different owner netvsc: reduce maximum GSO size drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links can: raw: raw_setsockopt: limit number of can_filter that can be set
2016-12-09net: mlx5: Fix Kconfig help textChristopher Covington
Since the following commit, Infiniband and Ethernet have not been mutually exclusive. Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09net: skb_condense() can also deal with empty skbsEric Dumazet
It seems attackers can also send UDP packets with no payload at all. skb_condense() can still be a win in this case. It will be possible to replace the custom code in tcp_add_backlog() to get full benefit from skb_condense() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09net: smsc911x: back out silently on probe deferralsLinus Walleij
When trying to get a regulator we may get deferred and we see this noise: smsc911x 1b800000.ethernet-ebi2 (unnamed net_device) (uninitialized): couldn't get regulators -517 Then the driver continues anyway. Which means that the regulator may not be properly retrieved and reference counted, and may be switched off in case noone else is using it. Fix this by returning silently on deferred probe and let the system work it out. Cc: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09Merge tag 'mac80211-next-for-davem-2016-12-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Three fixes: * fix a logic bug introduced by a previous cleanup * fix nl80211 attribute confusing (trying to use a single attribute for two purposes) * fix a long-standing BSS leak that happens when an association attempt is abandoned ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09ibmveth: set correct gso_size and gso_typeThomas Falcon
This patch is based on an earlier one submitted by Jon Maxwell with the following commit message: "We recently encountered a bug where a few customers using ibmveth on the same LPAR hit an issue where a TCP session hung when large receive was enabled. Closer analysis revealed that the session was stuck because the one side was advertising a zero window repeatedly. We narrowed this down to the fact the ibmveth driver did not set gso_size which is translated by TCP into the MSS later up the stack. The MSS is used to calculate the TCP window size and as that was abnormally large, it was calculating a zero window, even although the sockets receive buffer was completely empty." We rely on the Virtual I/O Server partition in a pseries environment to provide the MSS through the TCP header checksum field. The stipulation is that users should not disable checksum offloading if rx packet aggregation is enabled through VIOS. Some firmware offerings provide the MSS in the RX buffer. This is signalled by a bit in the RX queue descriptor. Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com> Reviewed-by: David Dai <zdai@us.ibm.com> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09Merge branch 'udp-receive-path-optimizations'David S. Miller
Eric Dumazet says: ==================== udp: receive path optimizations This patch series provides about 100 % performance increase under flood. v2: added Paolo feedback on udp_rmem_release() for tiny sk_rcvbuf added the last patch touching sk_rmem_alloc later ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09udp: udp_rmem_release() should touch sk_rmem_alloc laterEric Dumazet
In flood situations, keeping sk_rmem_alloc at a high value prevents producers from touching the socket. It makes sense to lower sk_rmem_alloc only at the end of udp_rmem_release() after the thread draining receive queue in udp_recvmsg() finished the writes to sk_forward_alloc. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09udp: add batching to udp_rmem_release()Eric Dumazet
If udp_recvmsg() constantly releases sk_rmem_alloc for every read packet, it gives opportunity for producers to immediately grab spinlocks and desperatly try adding another packet, causing false sharing. We can add a simple heuristic to give the signal by batches of ~25 % of the queue capacity. This patch considerably increases performance under flood by about 50 %, since the thread draining the queue is no longer slowed by false sharing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09udp: copy skb->truesize in the first cache lineEric Dumazet
In UDP RX handler, we currently clear skb->dev before skb is added to receive queue, because device pointer is no longer available once we exit from RCU section. Since this first cache line is always hot, lets reuse this space to store skb->truesize and thus avoid a cache line miss at udp_recvmsg()/udp_skb_destructor time while receive queue spinlock is held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09udp: add busylocks in RX pathEric Dumazet
Idea of busylocks is to let producers grab an extra spinlock to relieve pressure on the receive_queue spinlock shared by consumer. This behavior is requested only once socket receive queue is above half occupancy. Under flood, this means that only one producer can be in line trying to acquire the receive_queue spinlock. These busylock can be allocated on a per cpu manner, instead of a per socket one (that would consume a cache line per socket) This patch considerably improves UDP behavior under stress, depending on number of NIC RX queues and/or RPS spread. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09Merge branch 'qcom-emac'David S. Miller
Timur Tabi says: ==================== net: qcom/emac: simplify support for different SOCs On SOCs that have the Qualcomm EMAC network controller, the internal PHY block is always different. Sometimes the differences are small, sometimes it might be a completely different IP. Either way, using version numbers to differentiate them and putting all of the init code in one file does not scale. This patchset does two things: The first breaks up the current code into different files, and the second patch adds support for a third SOC, the Qualcomm Technologies QDF2400 ARM Server SOC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09net: qcom/emac: add support for the Qualcomm Technologies QDF2400Timur Tabi
The QDF2432 and the QDF2400 have slightly different internal PHYs, so there are some programming differences. Some of the registers in the QDF2400 have moved, and some registers require different values during initialization. Because of the differences, and because HIDs are a scare resource, the ACPI tables specify the hardware version in an _HRV property. Version 1 is the QDF2432, and version 2 is the QDF2400. Any future SOC that has the same internal PHY but different programming requirements will be assigned the next available version number. Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09net: qcom/emac: move phy init code to separate filesTimur Tabi
The internal PHY of the EMAC differs on each SOC, and the list will only continue to grow. By separating the code into individual files, we can add support for more SOCs more cleanly. Note: The internal PHY is also sometimes called the SGMII device. We also stop referring to the various PHY variations by version number, so no more "v2", "v3", etc. Instead, the devices are named after the SOC they are, which is in sync with the device tree property names. Future patches will probably rearrange more code among the files. Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-09Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Several fixes to the DSM (ACPI device specific method) marshaling implementation. I consider these urgent enough to send for 4.9 consideration since they fix the kernel's handling of ARS (Address Range Scrub) commands. Especially for platforms without machine-check-recovery capabilities, successful execution of ARS commands enables the platform to potentially break out of an infinite reboot problem if a media error is present in the boot path. There is also a one line fix for a device-dax read-only mapping regression. Commits 9a901f5495e2 ("acpi, nfit: fix extended status translations for ACPI DSMs") and 325896ffdf90 ("device-dax: fix private mapping restriction, permit read-only") are true regression fixes for changes introduced this cycle. Commit efda1b5d87cb ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling") fixes the kernel's handling of zero-length results, this never would have worked in the past, but we only just recently discovered a BIOS implementation that emits this arguably spec non-compliant result. The remaining two commits are additional fall out from thinking through the implications of a zero / truncated length result of the ARS Status command. In order to mitigate the risk that these changes introduce yet more regressions they are backstopped by a new unit test in commit a7de92dac9f0 ("tools/testing/nvdimm: unit test acpi_nfit_ctl()") that mocks up inputs to acpi_nfit_ctl()" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: fix private mapping restriction, permit read-only tools/testing/nvdimm: unit test acpi_nfit_ctl() acpi, nfit: fix bus vs dimm confusion in xlat_status acpi, nfit: validate ars_status output buffer size acpi, nfit, libnvdimm: fix / harden ars_status output length handling acpi, nfit: fix extended status translations for ACPI DSMs
2016-12-09Merge branch 'for-4.9-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "This is quite late but SCT Write Same support added during this cycle is broken subtly but seriously and it'd be best to disable it before v4.9 gets released. This contains two commits - one low impact sata_mv fix and the mentioned disabling of SCT Write Same" * 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata-scsi: disable SCT Write Same for the moment ata: sata_mv: check for errors when parsing nr-ports from dt
2016-12-09Merge tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A fix for an issue with ->d_revalidate() in ceph, causing frequent kernel crashes. Marked for stable - it goes back to 4.6, but started popping up only in 4.8" * tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-client: ceph: don't set req->r_locked_dir in ceph_d_revalidate
2016-12-09Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Final batch of SoC fixes A few fixes that have trickled in over the last week, all fixing minor errors in devicetrees -- UART pin assignment on Allwinner H3, correcting number of SATA ports on a Marvell-based Linkstation platform and a display clock fix for Freescale/NXP i.MX7D that fixes a freeze when starting up X" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: orion5x: fix number of sata port for linkstation ls-gl ARM: dts: imx7d: fix LCDIF clock assignment dts: sun8i-h3: correct UART3 pin definitions
2016-12-09Merge tag 'm68k-for-v4.9-tag2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k fixes from Geert Uytterhoeven: - build fix for drivers calling ndelay() in a conditional block without curly braces - defconfig updates * tag 'm68k-for-v4.9-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Fix ndelay() macro m68k/defconfig: Update defconfigs for v4.9-rc1