summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-30tools/bpf: adjust rlimit RLIMIT_MEMLOCK for test_verifier_logYonghong Song
The default rlimit RLIMIT_MEMLOCK is 64KB. In certain cases, e.g. in a test machine mimicking our production system, this test may fail due to unable to charge the required memory for prog load: # ./test_verifier_log Test log_level 0... ERROR: Program load returned: ret:-1/errno:1, expected ret:-1/errno:22 Changing the default rlimit RLIMIT_MEMLOCK to unlimited makes the test always pass. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30trace/xdp: fix compile warning: 'struct bpf_map' declared inside parameter listXie XiuQi
We meet this compile warning, which caused by missing bpf.h in xdp.h. In file included from ./include/trace/events/xdp.h:10:0, from ./include/linux/bpf_trace.h:6, from drivers/net/ethernet/intel/i40e/i40e_txrx.c:29: ./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration const struct bpf_map *map, u32 map_index), ^ ./include/linux/tracepoint.h:187:34: note: in definition of macro ‘__DECLARE_TRACE’ static inline void trace_##name(proto) \ ^~~~~ ./include/linux/tracepoint.h:352:24: note: in expansion of macro ‘PARAMS’ __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ ^~~~~~ ./include/linux/tracepoint.h:477:2: note: in expansion of macro ‘DECLARE_TRACE’ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) ^~~~~~~~~~~~~ ./include/linux/tracepoint.h:477:22: note: in expansion of macro ‘PARAMS’ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) ^~~~~~ ./include/trace/events/xdp.h:89:1: note: in expansion of macro ‘DEFINE_EVENT’ DEFINE_EVENT(xdp_redirect_template, xdp_redirect, ^~~~~~~~~~~~ ./include/trace/events/xdp.h:90:2: note: in expansion of macro ‘TP_PROTO’ TP_PROTO(const struct net_device *dev, ^~~~~~~~ ./include/trace/events/xdp.h:93:17: warning: ‘struct bpf_map’ declared inside parameter list will not be visible outside of this definition or declaration const struct bpf_map *map, u32 map_index), ^ ./include/linux/tracepoint.h:203:38: note: in definition of macro ‘__DECLARE_TRACE’ register_trace_##name(void (*probe)(data_proto), void *data) \ ^~~~~~~~~~ ./include/linux/tracepoint.h:354:4: note: in expansion of macro ‘PARAMS’ PARAMS(void *__data, proto), \ ^~~~~~ Reported-by: Huang Daode <huangdaode@hisilicon.com> Cc: Hanjun Guo <guohanjun@huawei.com> Fixes: 8d3b778ff544 ("xdp: tracepoint xdp_redirect also need a map argument") Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30Merge branch 'bpftool-misc-fixes'Daniel Borkmann
Quentin Monnet says: ==================== First commit in this series fixes a crash that occurs when incorrect arguments are passed to bpftool after the `--json` option. It comes from the usage() function trying to use the JSON writer, although the latter has not been created yet at that point. Other patches add destruction of the writer in case the program exits in usage(), fix error messages handling when an unrecognized option is encountered, remove a spurious new-line character in an error message. Last patches are related to the Makefiles. They fix the installation directory prefix and .PHONY targets. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30tools: bpftool: declare phony targets as suchQuentin Monnet
In the Makefile, targets install, doc and doc-install should be added to .PHONY. Let's fix this. Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30tools: bpftool: unify installation directoriesQuentin Monnet
Programs and documentation not managed by package manager are generally installed under /usr/local/, instead of the user's home directory. In particular, `man` is generally able to find manual pages under `/usr/local/share/man`. bpftool generally follows perf's example, and perf installs to home directory. However bpftool requires root credentials, so it seems sensible to follow the more common convention of installing files under /usr/local instead. So, make /usr/local the default prefix for installing the binary with `make install`, and the documentation with `make doc-install`. Also, create /usr/local/sbin if it does not exist. Note that the bash-completion file, however, is still installed under /usr/share/bash-completion/completions, as the default setup for bash does not attempt to load completion files under /usr/local/. Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30tools: bpftool: remove spurious line break from error messageQuentin Monnet
The end-of-line character inside the string would break JSON compliance. Remove it, `p_err()` already adds a '\n' character for plain output anyway. Fixes: 9a5ab8bf1d6d ("tools: bpftool: turn err() and info() macros into functions") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30tools: bpftool: make error message from getopt_long() JSON-friendlyQuentin Monnet
If `getopt_long()` meets an unknown option, it prints its own error message to standard error output. While this does not strictly break JSON output, it is the only case bpftool prints something to standard error output if JSON output is required. All other errors are printed on standard output as JSON objects, so that an external program does not have to parse stderr. This is changed by setting the global variable `opterr` to 0. Furthermore, p_err() is used to reproduce the error message in a more JSON-friendly way, so that users still get to know what the erroneous option is. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30tools: bpftool: clean up the JSON writer before exiting in usage()Quentin Monnet
The writer is cleaned at the end of the main function, but not if the program exits sooner in usage(). Let's keep it clean and destroy the writer before exiting. Destruction and actual call to exit() are moved to another function so that clean exit can also be performed without printing usage() hints. Fixes: d35efba99d92 ("tools: bpftool: introduce --json and --pretty options") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30tools: bpftool: fix crash on bad parameters with JSONQuentin Monnet
If bad or unrecognised parameters are specified after JSON output is requested, `usage()` will try to output null JSON object before the writer is created. To prevent this, create the writer as soon as the `--json` option is parsed. Fixes: 004b45c0e51a ("tools: bpftool: provide JSON output for all possible commands") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-27bpf: offload: add a license headerJakub Kicinski
I forgot to add a license on kernel/bpf/offload.c. Luckily I'm still the only author so make it explicitly GPLv2. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-27tipc: eliminate access after delete in group_filter_msg()Jon Maloy
KASAN revealed another access after delete in group.c. This time it found that we read the header of a received message after the buffer has been released. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-27xen-netfront: remove warning when unloading moduleEduardo Otubo
v2: * Replace busy wait with wait_event()/wake_up_all() * Cannot garantee that at the time xennet_remove is called, the xen_netback state will not be XenbusStateClosed, so added a condition for that * There's a small chance for the xen_netback state is XenbusStateUnknown by the time the xen_netfront switches to Closed, so added a condition for that. When unloading module xen_netfront from guest, dmesg would output warning messages like below: [ 105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use! [ 105.236839] deferring g.e. 0x903 (pfn 0x35805) This problem relies on netfront and netback being out of sync. By the time netfront revokes the g.e.'s netback didn't have enough time to free all of them, hence displaying the warnings on dmesg. The trick here is to make netfront to wait until netback frees all the g.e.'s and only then continue to cleanup for the module removal, and this is done by manipulating both device states. Signed-off-by: Eduardo Otubo <otubo@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28Merge tag 'mac80211-for-davem-2017-11-27' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Four fixes: * CRYPTO_SHA256 is needed for regdb validation * mac80211: mesh path metric was wrong in some frames * mac80211: use QoS null-data packets on QoS connections * mac80211: tear down RX aggregation sessions first to drop fewer packets in HW restart scenarios ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28Merge branch 'sctp-stream-reconfig-fixes'David S. Miller
Xin Long says: ==================== sctp: a bunch of fixes for stream reconfig This patchset is to make stream reset and asoc reset work more correctly for stream reconfig. Thank to Marcelo making them very clear. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1Xin Long
When doing asoc reset, if the sender of the response has already sent some chunk and increased asoc->next_tsn before the duplicate request comes, the response will use the old result with an incorrect sender next_tsn. Better than asoc->next_tsn, asoc->ctsn_ack_point can't be changed after the sender of the response has performed the asoc reset and before the peer has confirmed it, and it's value is still asoc->next_tsn original value minus 1. This patch sets sender next_tsn for the old result with ctsn_ack_point plus 1 when processing the duplicate request, to make sure the sender next_tsn value peer gets will be always right. Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28sctp: avoid flushing unsent queue when doing asoc resetXin Long
Now when doing asoc reset, it cleans up sacked and abandoned queues by calling sctp_outq_free where it also cleans up unsent, retransmit and transmitted queues. It's safe for the sender of response, as these 3 queues are empty at that time. But when the receiver of response is doing the reset, the users may already enqueue some chunks into unsent during the time waiting the response, and these chunks should not be flushed. To void the chunks in it would be removed, it moves the queue into a temp list, then gets it back after sctp_outq_free is done. The patch also fixes some incorrect comments in sctp_process_strreset_tsnreq. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28sctp: only allow the asoc reset when the asoc outq is emptyXin Long
As it says in rfc6525#section5.1.4, before sending the request, C2: The sender has either no outstanding TSNs or considers all outstanding TSNs abandoned. Prior to this patch, it tried to consider all outstanding TSNs abandoned by dropping all chunks in all outqs with sctp_outq_free (even including sacked, retransmit and transmitted queues) when doing this reset, which is too aggressive. To make it work gently, this patch will only allow the asoc reset when the sender has no outstanding TSNs by checking if unsent, transmitted and retransmit are all empty with sctp_outq_is_empty before sending and processing the request. Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28sctp: only allow the out stream reset when the stream outq is emptyXin Long
Now the out stream reset in sctp stream reconf could be done even if the stream outq is not empty. It means that users can not be sure since which msg the new ssn will be used. To make this more synchronous, it shouldn't allow to do out stream reset until these chunks in unsent outq all are sent out. This patch checks the corresponding stream outqs when sending and processing the request . If any of them has unsent chunks in outq, it will return -EAGAIN instead or send SCTP_STRRESET_IN_PROGRESS back to the sender. Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28sctp: use sizeof(__u16) for each stream number length instead of magic numberXin Long
Now in stream reconf part there are still some places using magic number 2 for each stream number length. To make it more readable, this patch is to replace them with sizeof(__u16). Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-27mac80211: tear down RX aggregations firstSara Sharon
When doing HW restart we tear down aggregations. Since at this point we are not TX'ing any aggregation, while the peer is still sending RX aggregation over the air, it will make sense to tear down the RX aggregations first. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-11-27mac80211: fix the update of path metric for RANN frameChun-Yeow Yeoh
The previous path metric update from RANN frame has not considered the own link metric toward the transmitting mesh STA. Fix this. Reported-by: Michael65535 Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-11-27mac80211: use QoS NDP for AP probingJohannes Berg
When connected to a QoS/WMM AP, mac80211 should use a QoS NDP for probing it, instead of a regular non-QoS one, fix this. Change all the drivers to *not* allow QoS NDP for now, even though it looks like most of them should be OK with that. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-11-26openvswitch: fix the incorrect flow action alloc sizezhangliping
If we want to add a datapath flow, which has more than 500 vxlan outputs' action, we will get the following error reports: openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Actions may not be safe on all matching packets ... ... It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but this is not the root cause. For example, for a vxlan output action, we need about 60 bytes for the nlattr, but after it is converted to the flow action, it only occupies 24 bytes. This means that we can still support more than 1000 vxlan output actions for a single datapath flow under the the current 32k max limitation. So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we shouldn't report EINVAL and keep it move on, as the judgement can be done by the reserve_sfa_size. Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-27net: openvswitch: datapath: fix data type in queue_gso_packetsGustavo A. R. Silva
gso_type is being used in binary AND operations together with SKB_GSO_UDP. The issue is that variable gso_type is of type unsigned short and SKB_GSO_UDP expands to more than 16 bits: SKB_GSO_UDP = 1 << 16 this makes any binary AND operation between gso_type and SKB_GSO_UDP to be always zero, hence making some code unreachable and likely causing undesired behavior. Fix this by changing the data type of variable gso_type to unsigned int. Addresses-Coverity-ID: 1462223 Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-26uapi: add SPDX identifier to vm_sockets_diag.hStephen Hemminger
New file seems to have missed the SPDX license scan and update. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-26net: dsa: fix 'increment on 0' warningVivien Didelot
Setting the refcount to 0 when allocating a tree to match the number of switch devices it holds may cause an 'increment on 0; use-after-free', if CONFIG_REFCOUNT_FULL is enabled. To fix this, do not decrement the refcount of a newly allocated tree, increment it when an already allocated tree is found, and decrement it after the probing of a switch, as done with the previous behavior. At the same time, make dsa_tree_get and dsa_tree_put accept a NULL argument to simplify callers, and return the tree after incrementation, as most kref users like of_node_get and of_node_put do. Fixes: 8e5bf9759a06 ("net: dsa: simplify tree reference counting") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-26VSOCK: Don't call vsock_stream_has_data in atomic contextJorgen Hansen
When using the host personality, VMCI will grab a mutex for any queue pair access. In the detach callback for the vmci vsock transport, we call vsock_stream_has_data while holding a spinlock, and vsock_stream_has_data will access a queue pair. To avoid this, we can simply omit calling vsock_stream_has_data for host side queue pairs, since the QPs are empty per default when the guest has detached. This bug affects users of VMware Workstation using kernel version 4.4 and later. Testing: Ran vsock tests between guest and host, and verified that with this change, the host isn't calling vsock_stream_has_data during detach. Ran mixedTest between guest and host using both guest and host as server. v2: Rebased on top of recent change to sk_state values Reviewed-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-25net: sched: crash on blocks with goto chain actionRoman Kapl
tcf_block_put_ext has assumed that all filters (and thus their goto actions) are destroyed in RCU callback and thus can not race with our list iteration. However, that is not true during netns cleanup (see tcf_exts_get_net comment). Prevent the user after free by holding all chains (except 0, that one is already held). foreach_safe is not enough in this case. To reproduce, run the following in a netns and then delete the ns: ip link add dtest type dummy tc qdisc add dev dtest ingress tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2 Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()") Signed-off-by: Roman Kapl <code@rkapl.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-25net: thunderbolt: Stop using zero to mean no valid DMA mappingMika Westerberg
Commit 86dabda426ac ("net: thunderbolt: Clear finished Tx frame bus address in tbnet_tx_callback()") fixed a DMA-API violation where the driver called dma_unmap_page() in tbnet_free_buffers() for a bus address that might already be unmapped. The fix was to zero out the bus address of a frame in tbnet_tx_callback(). However, as pointed out by David Miller, zero might well be valid mapping (at least in theory) so it is not good idea to use it here. It turns out that we don't need the whole map/unmap dance for Tx buffers at all. Instead we can map the buffers when they are initially allocated and unmap them when the interface is brought down. In between we just DMA sync the buffers for the CPU or device as needed. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-25net: thunderx: Fix TCP/UDP checksum offload for IPv6 pktsSunil Goutham
Don't offload IP header checksum to NIC. This fixes a previous patch which enabled checksum offloading for both IPv4 and IPv6 packets. So L3 checksum offload was getting enabled for IPv6 pkts. And HW is dropping these pkts as it assumes the pkt is IPv4 when IP csum offload is set in the SQ descriptor. Fixes: 3a9024f52c2e ("net: thunderx: Enable TSO and checksum offloads for ipv6") Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@auriga.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24cfg80211: select CRYPTO_SHA256 if neededJohannes Berg
When regulatory database certificates are built-in, they're currently using the SHA256 digest algorithm, so add that to the build in that case. Also add a note that for custom certificates, one may need to add the right algorithms. Reported-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-11-25forcedeth: replace pci_unmap_page with dma_unmap_pageZhu Yanjun
The function pci_unmap_page is obsolete. So it is replaced with the function dma_unmap_page. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-25Merge tag 'rxrpc-fixes-20171124' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes and improvements Here's a set of patches that fix and improve some stuff in the AF_RXRPC protocol: The patches are: (1) Unlock mutex returned by rxrpc_accept_call(). (2) Don't set connection upgrade by default. (3) Differentiate the call->user_mutex used by the kernel from that used by userspace calling sendmsg() to avoid lockdep warnings. (4) Delay terminal ACK transmission to a work queue so that it can be replaced by the next call if there is one. (5) Split the call parameters from the connection parameters so that more call-specific parameters can be passed through. (6) Fix the call timeouts to work the same as for other RxRPC/AFS implementations. (7) Don't transmit DELAY ACKs immediately, but instead delay them slightly so that can be discarded or can represent more packets. (8) Use RTT to calculate certain protocol timeouts. (9) Add a timeout to detect lost ACK/DATA packets. (10) Add a keepalive function so that we ping the peer if we haven't transmitted for a short while, thereby keeping intervening firewall routes open. (11) Make service endpoints expire like they're supposed to so that the UDP port can be reused. (12) Fix connection expiry timers to make cleanup happen in a more timely fashion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-24rxrpc: Fix conn expiry timersDavid Howells
Fix the rxrpc connection expiry timers so that connections for closed AF_RXRPC sockets get deleted in a more timely fashion, freeing up the transport UDP port much more quickly. (1) Replace the delayed work items with work items plus timers so that timer_reduce() can be used to shorten them and so that the timer doesn't requeue the work item if the net namespace is dead. (2) Don't use queue_delayed_work() as that won't alter the timeout if the timer is already running. (3) Don't rearm the timers if the network namespace is dead. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Fix service endpoint expiryDavid Howells
RxRPC service endpoints expire like they're supposed to by the following means: (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the global service conn timeout, otherwise the first rxrpc_net struct to die will cause connections on all others to expire immediately from then on. (2) Mark local service endpoints for which the socket has been closed (->service_closed) so that the expiration timeout can be much shortened for service and client connections going through that endpoint. (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage count reaches 1, not 0, as idle conns have a 1 count. (4) The accumulator for the earliest time we might want to schedule for should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as the comparison functions use signed arithmetic. (5) Simplify the expiration handling, adding the expiration value to the idle timestamp each time rather than keeping track of the time in the past before which the idle timestamp must go to be expired. This is much easier to read. (6) Ignore the timeouts if the net namespace is dead. (7) Restart the service reaper work item rather the client reaper. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Add keepalive for a callDavid Howells
We need to transmit a packet every so often to act as a keepalive for the peer (which has a timeout from the last time it received a packet) and also to prevent any intervening firewalls from closing the route. Do this by resetting a timer every time we transmit a packet. If the timer ever expires, we transmit a PING ACK packet and thereby also elicit a PING RESPONSE ACK from the other side - which prevents our last-rx timeout from expiring. The timer is set to 1/6 of the last-rx timeout so that we can detect the other side going away if it misses 6 replies in a row. This is particularly necessary for servers where the processing of the service function may take a significant amount of time. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Add a timeout for detecting lost ACKs/lost DATADavid Howells
Add an extra timeout that is set/updated when we send a DATA packet that has the request-ack flag set. This allows us to detect if we don't get an ACK in response to the latest flagged packet. The ACK packet is adjudged to have been lost if it doesn't turn up within 2*RTT of the transmission. If the timeout occurs, we schedule the sending of a PING ACK to find out the state of the other side. If a new DATA packet is ready to go sooner, we cancel the sending of the ping and set the request-ack flag on that instead. If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what we had at the time of the ping transmission, we adjudge all the DATA packets sent between the response tx_top and the ping-time tx_top to have been lost and retransmit immediately. Rather than sending a PING ACK, we could just pick a DATA packet and speculatively retransmit that with request-ack set. It should result in either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the a PING-RESPONSE ACK mentioned above. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Express protocol timeouts in terms of RTTDavid Howells
Express protocol timeouts for data retransmission and deferred ack generation in terms on RTT rather than specified timeouts once we have sufficient RTT samples. For the moment, this requires just one RTT sample to be able to use this for ack deferral and two for data retransmission. The data retransmission timeout is set at RTT*1.5 and the ACK deferral timeout is set at RTT. Note that the calculated timeout is limited to a minimum of 4ns to make sure it doesn't happen too quickly. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Don't transmit DELAY ACKs immediately on proposalDavid Howells
Don't transmit a DELAY ACK immediately on proposal when the Rx window is rotated, but rather defer it to the work function. This means that we have a chance to queue/consume more received packets before we actually send the DELAY ACK, or even cancel it entirely, thereby reducing the number of packets transmitted. We do, however, want to continue sending other types of packet immediately, particularly REQUESTED ACKs, as they may be used for RTT calculation by the other side. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Fix call timeoutsDavid Howells
Fix the rxrpc call expiration timeouts and make them settable from userspace. By analogy with other rx implementations, there should be three timeouts: (1) "Normal timeout" This is set for all calls and is triggered if we haven't received any packets from the peer in a while. It is measured from the last time we received any packet on that call. This is not reset by any connection packets (such as CHALLENGE/RESPONSE packets). If a service operation takes a long time, the server should generate PING ACKs at a duration that's substantially less than the normal timeout so is to keep both sides alive. This is set at 1/6 of normal timeout. (2) "Idle timeout" This is set only for a service call and is triggered if we stop receiving the DATA packets that comprise the request data. It is measured from the last time we received a DATA packet. (3) "Hard timeout" This can be set for a call and specified the maximum lifetime of that call. It should not be specified by default. Some operations (such as volume transfer) take a long time. Allow userspace to set/change the timeouts on a call with sendmsg, using a control message: RXRPC_SET_CALL_TIMEOUTS The data to the message is a number of 32-bit words, not all of which need be given: u32 hard_timeout; /* sec from first packet */ u32 idle_timeout; /* msec from packet Rx */ u32 normal_timeout; /* msec from data Rx */ This can be set in combination with any other sendmsg() that affects a call. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Split the call params from the operation paramsDavid Howells
When rxrpc_sendmsg() parses the control message buffer, it places the parameters extracted into a structure, but lumps together call parameters (such as user call ID) with operation parameters (such as whether to send data, send an abort or accept a call). Split the call parameters out into their own structure, a copy of which is then embedded in the operation parameters struct. The call parameters struct is then passed down into the places that need it instead of passing the individual parameters. This allows for extra call parameters to be added. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Delay terminal ACK transmission on a client callDavid Howells
Delay terminal ACK transmission on a client call by deferring it to the connection processor. This allows it to be skipped if we can send the next call instead, the first DATA packet of which will implicitly ack this call. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Provide a different lockdep key for call->user_mutex for kernel callsDavid Howells
Provide a different lockdep key for rxrpc_call::user_mutex when the call is made on a kernel socket, such as by the AFS filesystem. The problem is that lockdep registers a false positive between userspace calling the sendmsg syscall on a user socket where call->user_mutex is held whilst userspace memory is accessed whereas the AFS filesystem may perform operations with mmap_sem held by the caller. In such a case, the following warning is produced. ====================================================== WARNING: possible circular locking dependency detected 4.14.0-fscache+ #243 Tainted: G E ------------------------------------------------------ modpost/16701 is trying to acquire lock: (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs] but task is already holding lock: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++}: __might_fault+0x61/0x89 _copy_from_iter_full+0x40/0x1fa rxrpc_send_data+0x8dc/0xff3 rxrpc_do_sendmsg+0x62f/0x6a1 rxrpc_sendmsg+0x166/0x1b7 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1ad/0x22b __sys_sendmsg+0x41/0x62 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #2 (&call->user_mutex){+.+.}: __mutex_lock+0x86/0x7d2 rxrpc_new_client_call+0x378/0x80e rxrpc_kernel_begin_call+0xf3/0x154 afs_make_call+0x195/0x454 [kafs] afs_vl_get_capabilities+0x193/0x198 [kafs] afs_vl_lookup_vldb+0x5f/0x151 [kafs] afs_create_volume+0x2e/0x2f4 [kafs] afs_mount+0x56a/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #1 (k-sk_lock-AF_RXRPC){+.+.}: lock_sock_nested+0x74/0x8a rxrpc_kernel_begin_call+0x8a/0x154 afs_make_call+0x195/0x454 [kafs] afs_fs_get_capabilities+0x17a/0x17f [kafs] afs_probe_fileserver+0xf7/0x2f0 [kafs] afs_select_fileserver+0x83f/0x903 [kafs] afs_fetch_status+0x89/0x11d [kafs] afs_iget+0x16f/0x4f8 [kafs] afs_mount+0x6c6/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #0 (&vnode->io_lock){+.+.}: lock_acquire+0x174/0x19f __mutex_lock+0x86/0x7d2 afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 __clear_user+0x3d/0x60 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 other info that might help us debug this: Chain exists of: &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&call->user_mutex); lock(&mm->mmap_sem); lock(&vnode->io_lock); *** DEADLOCK *** 1 lock held by modpost/16701: #0: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 stack backtrace: CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ #243 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack+0x67/0x8e print_circular_bug+0x341/0x34f check_prev_add+0x11f/0x5d4 ? add_lock_to_list.isra.12+0x8b/0x8b ? add_lock_to_list.isra.12+0x8b/0x8b ? __lock_acquire+0xf77/0x10b4 __lock_acquire+0xf77/0x10b4 lock_acquire+0x174/0x19f ? afs_begin_vnode_operation+0x33/0x77 [kafs] __mutex_lock+0x86/0x7d2 ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba ? filemap_fault+0x179/0x54d filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 RIP: 0010:__clear_user+0x3d/0x60 RSP: 0018:ffff880071e93da0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720 RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00 R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdb6009ee07 RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07 RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900 RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000 R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000 R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000 Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: Don't set upgrade by default in sendmsg()David Howells
Don't set upgrade by default when creating a call from sendmsg(). This is a holdover from when I was testing the code. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-24rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasingDavid Howells
The caller of rxrpc_accept_call() must release the lock on call->user_mutex returned by that function. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho. 2) bpf offload bug fixes from Jakub Kicinski. 3) Fix bpf verifier to NOP out code which is dead at run time because due to branch pruning the verifier will not explore such instructions. From Alexei Starovoitov. 4) Fix crash when deleting secondary chains in packet scheduler classifier. From Roman Kapl. 5) Fix buffer management bugs in smc, from Ursula Braun. 6) Fix regression in anycast route handling, from David Ahern. 7) Fix link settings regression in r8169, from Tobias Jakobi. 8) Add back enough UFO support so that live migration still works, from Willem de Bruijn. 9) Linearize enough packet data for the full extent to which the ipvlan code will inspect the packet headers, from Gao Feng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) ipvlan: Fix insufficient skb linear check for ipv6 icmp ipvlan: Fix insufficient skb linear check for arp geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6 net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY net: accept UFO datagrams from tuntap and packet net: realtek: r8169: implement set_link_ksettings() net: ipv6: Fixup device for anycast routes during copy net/smc: Fix preinitialization of buf_desc in __smc_buf_create() net/smc: use sk_rcvbuf as start for rmb creation ipv6: Do not consider linkdown nexthops during multipath net: sched: fix crash when deleting secondary chains net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE bpf: fix branch pruning logic bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO bpf: remove explicit handling of 0 for arg2 in bpf_probe_read bpf: introduce ARG_PTR_TO_MEM_OR_NULL i40evf: Use smp_rmb rather than read_barrier_depends fm10k: Use smp_rmb rather than read_barrier_depends igb: Use smp_rmb rather than read_barrier_depends ...
2017-11-23Merge tag 'platform-drivers-x86-v4.15-2' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Fix two issues resulting from the dell-smbios refactoring and introduction of the dell-smbios-wmi dispatcher. The first ensures a proper error code is returned when kzalloc fails. The second avoids an issue in older Dell BIOS implementations which would fail if the more complex calls were made by limiting those platforms to the simple calls such as those used by the existing dell-laptop and dell-wmi drivers, preserving their functionality prior to the addition of the dell-smbios-wmi dispatcher" * tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: dell-laptop: fix error return code in dell_init() platform/x86: dell-smbios-wmi: Disable userspace interface if missing hotfix
2017-11-23Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two basic fixes: one for the sparse problem with the blacklist flags and another for a hang forever in bnx2i" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Use 'blist_flags_t' for scsi_devinfo flags scsi: bnx2fc: Fix hung task messages when a cleanup response is not received during abort
2017-11-23Merge tag 'sound-fix-4.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All commits found here are small fixes for regression or stable: - PCM timestamp behavior fix that could be seen as a regression - Remove spurious WARN_ON() from ALSA timer 32bit compat ioctl - HD-audio HDMI/DP channel mapping fix for 32bit archs - Fix the previous fix for HD-audio initialization code - More hardening USB-audio against malicious USB descriptors - HD-audio quirks/fixes (Realtek codec, AMD controller) - Missing help text for the recent Intel SST kconfig change" * tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda: Add Raven PCI ID ALSA: hda/realtek - Fix ALC700 family no sound issue ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization ALSA: usb-audio: Add sanity checks in v2 clock parsers ALSA: usb-audio: Fix potential zero-division at parsing FU ALSA: usb-audio: Fix potential out-of-bound access at parsing SU ALSA: usb-audio: Add sanity checks to FE parser ALSA: timer: Remove kernel warning at compat ioctl error paths ALSA: pcm: update tstamp only if audio_tstamp changed ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon ALSA: hda: Fix too short HDMI/DP chmap reporting ALSA: usb-audio: uac1: Invalidate ctl on interrupt ALSA: hda/realtek - Fix ALC275 no sound issue ASoC: Intel: Add help text for SND_SOC_INTEL_SST_TOPLEVEL
2017-11-23Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull more drm updates from Dave Airlie: "Fixes/cleanups for rc1, non-desktop flags for VR - remove the MSM dt-bindings file Rob managed to push in the previous pull. - add a property/edid quirk to denote HMD devices, I had these hanging around for a few weeks and Keith had done some work on them, they are fairly self contained and small, and only affect people using HTC Vive VR headsets so far. - amdgpu, tegra, tilcdc, fsl fixes - some imx-drm cleanups I missed, these seemed pretty small, and no reason to hold off. I have one TTM regression fix (fixes bochs-vga in qemu) sitting locally awaiting review I'll probably send that in a separate pull request tomorrow" * tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits) dt-bindings: remove file that was added accidentally drm/edid: quirk HTC vive headset as non-desktop. [v2] drm/fb: add support for not enabling fbcon on non-desktop displays [v2] drm: add connector info/property for non-desktop displays [v2] drm/amdgpu: fix rmmod KCQ disable failed error drm/amdgpu: fix kernel hang when starting VNC server drm/amdgpu: don't skip attributes when powerplay is enabled drm/amd/pp: fix typecast error in powerplay. drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support drm/tegra: sor: Reimplement pad clock Revert "drm/radeon: dont switch vt on suspend" drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence drm/amd/powerplay: fix unfreeze level smc message for smu7 drm/amdgpu:fix memleak drm/amdgpu:fix memleak in takedown drm/amd/pp: fix dpm randomly failed on Vega10 drm/amdgpu: set f_mapping on exported DMA-bufs drm/amdgpu: Properly allocate VM invalidate eng v2 drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume() drm/fsl-dcu: avoid disabling pixel clock twice on suspend ...