summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2013-02-04mcast: do not check 'rv' twice in a rowJean Sacren
With the loop, don't check 'rv' twice in a row. Without the loop, 'rv' doesn't even need to be checked. Make the comment more grammar-friendly. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-04tcp: ipv6: Update MIB counters for dropsVijay Subramanian
This patch updates LINUX_MIB_LISTENDROPS and LINUX_MIB_LISTENOVERFLOWS in tcp_v6_conn_request() and tcp_v6_err(). tcp_v6_conn_request() in particular can drop SYNs for various reasons which are not currently tracked. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31ipv6: export ip6_datagram_recv_ctlTom Parkin
ip6_datagram_recv_ctl and ip6_datagram_send_ctl are used for handling IPv6 ancillary data. Since ip6_datagram_send_ctl is already publicly exported for use in modules, ip6_datagram_recv_ctl should also be available to support ancillary data in the receive path. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-31ipv6: rename datagram_send_ctl and datagram_recv_ctlTom Parkin
The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific. Since datagram_send_ctl is publicly exported it should be appropriately named to reflect the fact that it's for IPv6 only. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30ipv6 anycast: Convert ipv6_sk_ac_lock to spinlock.YOSHIFUJI Hideaki / 吉藤英明
Since all users are write-lock, it does not make sense to use rwlock here. Use simple spinlock. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30ipv6 flowlabel: Convert hash list to RCU.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30ipv6 flowlabel: Ensure to take lock when modifying np->ip6_sk_fl_list.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-30ipv6: do not create neighbor entries for local deliveryMarcelo Ricardo Leitner
They will be created at output, if ever needed. This avoids creating empty neighbor entries when TPROXYing/Forwarding packets for addresses that are not even directly reachable. Note that IPv4 already handles it this way. No neighbor entries are created for local input. Tested by myself and customer. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29netfilter ip6table_mangle: Use ipv6_addr_equal() where appropriate.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29xfrm: Use ipv6_addr_equal() where appropriate.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29ipv6 mcast: Use ipv6_addr_equal() in ip6_mc_source().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29ipv6 addrconf: Fix interface identifiers of 802.15.4 devices.YOSHIFUJI Hideaki / 吉藤英明
The "Universal/Local" (U/L) bit must be complmented according to RFC4944 and RFC2464. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug fixes that some net-next work will build upon. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29ipv6: add anti-spoofing checks for 6to4 and 6rdHannes Frederic Sowa
This patch adds anti-spoofing checks in sit.c as specified in RFC3964 section 5.2 for 6to4 and RFC5969 section 12 for 6rd. I left out the checks which could easily be implemented with netfilter. Specifically this patch adds following logic (based loosely on the pseudocode in RFC3964 section 5.2): if prefix (inner_src_v6) == rd6_prefix (2002::/16 is the default) and outer_src_v4 != embedded_ipv4 (inner_src_v6) drop if prefix (inner_dst_v6) == rd6_prefix (or 2002::/16 is the default) and outer_dst_v4 != embedded_ipv4 (inner_dst_v6) drop accept To accomplish the specified security checks proposed by above RFCs, it is still necessary to employ uRPF filters with netfilter. These new checks only kick in if the employed addresses are within the 2002::/16 or another range specified by the 6rd-prefix (which defaults to 2002::/16). Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29ipv6: Fix inet6_csk_bind_conflict so it builds with user namespaces enabledEric W. Biederman
When attempting to build linux-next with user namespaces enabled I ran into this fun build error. CC net/ipv6/inet6_connection_sock.o .../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’: .../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using type ‘kuid_t’ .../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’ .../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’ make[3]: *** [net/ipv6/inet6_connection_sock.o] Error 1 make[2]: *** [net/ipv6] Error 2 make[2]: *** Waiting for unfinished jobs.... Using kuid_t instead of int to hold the uid fixes this. Cc: Tom Herbert <therbert@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29ipv4: introduce address lifetimeJiri Pirko
There are some usecase when lifetime of ipv4 addresses might be helpful. For example: 1) initramfs networkmanager uses a DHCP daemon to learn network configuration parameters 2) initramfs networkmanager addresses, routes and DNS configuration 3) initramfs networkmanager is requested to stop 4) initramfs networkmanager stops all daemons including dhclient 5) there are addresses and routes configured but no daemon running. If the system doesn't start networkmanager for some reason, addresses and routes will be used forever, which violates RFC 2131. This patch is essentially a backport of ivp6 address lifetime mechanism for ipv4 addresses. Current "ip" tool supports this without any patch (since it does not distinguish between ipv4 and ipv6 addresses in this perspective. Also, this should be back-compatible with all current netlink users. Reported-by: Pavel Šimerda <psimerda@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29net: frag, move LRU list maintenance outside of rwlockJesper Dangaard Brouer
Updating the fragmentation queues LRU (Least-Recently-Used) list, required taking the hash writer lock. However, the LRU list isn't tied to the hash at all, so we can use a separate lock for it. Original-idea-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29net: frag helper functions for mem limit trackingJesper Dangaard Brouer
This change is primarily a preparation to ease the extension of memory limit tracking. The change does reduce the number atomic operation, during freeing of a frag queue. This does introduce a some performance improvement, as these atomic operations are at the core of the performance problems seen on NUMA systems. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-28net: fix possible wrong checksum generationEric Dumazet
Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27ip6mr: limit IPv6 MRT_TABLE identifiersDan Carpenter
We did this for IPv4 in b49d3c1e1c "net: ipmr: limit MRT_TABLE identifiers" but we need to do it for IPv6 as well. On IPv6 the name is "pim6reg" instead of "pimreg" so there is one less digit allowed. The strcpy() is in ip6mr_reg_vif(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-27Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== This batch contains netfilter updates for you net-next tree, they are: * The new connlabel extension for x_tables, that allows us to attach labels to each conntrack flow. The kernel implementation uses a bitmask and there's a file in user-space that maps the bits with the corresponding string for each existing label. By now, you can attach up to 128 overlapping labels. From Florian Westphal. * A new round of improvements for the netns support for conntrack. Gao feng has moved many of the initialization code of each module of the netns init path. He also made several code refactoring, that code looks cleaner to me now. * Added documentation for all possible tweaks for nf_conntrack via sysctl, from Jiri Pirko. * Cisco 7941/7945 IP phone support for our SIP conntrack helper, from Kevin Cernekee. * Missing header file in the snmp helper, from Stephen Hemminger. * Finally, a couple of fixes to resolve minor issues with these changes, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23soreuseport: UDP/IPv6 implementationTom Herbert
Motivation for soreuseport would be something like a DNS server.  An alternative would be to recv on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock. Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port.  This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23soreuseport: TCP/IPv6 implementationTom Herbert
Motivation for soreuseport would be something like a web server binding to port 80 running with multiple threads, where each thread might have it's own listener socket. This could be done as an alternative to other models: 1) have one listener thread which dispatches completed connections to workers. 2) accept on a single listener socket from multiple threads. In case #1 the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load (assuming simple event loop: while (1) { accept(); process() }, wakeup does not promote fairness among the sockets. We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With so_reusport the distribution is uniform. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23netfilter: nf_conntrack: refactor l4proto support for netnsGao feng
Move the code that register/unregister l4proto to the module_init/exit context. Given that we have to modify some interfaces to accomodate these changes, it is a good time to use shorter function names for this using the nf_ct_* prefix instead of nf_conntrack_*, that is: nf_ct_l4proto_register nf_ct_l4proto_pernet_register nf_ct_l4proto_unregister nf_ct_l4proto_pernet_unregister We same many line breaks with it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_conntrack: refactor l3proto support for netnsGao feng
Move the code that register/unregister l3proto to the module_init/exit context. Given that we have to modify some interfaces to accomodate these changes, it is a good time to use shorter function names for this using the nf_ct_* prefix instead of nf_conntrack_*, that is: nf_ct_l3proto_register nf_ct_l3proto_pernet_register nf_ct_l3proto_unregister nf_ct_l3proto_pernet_unregister We same many line breaks with it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-22ipv6: remove duplicated declaration of ip6_fragment()Cong Wang
It is declared in: include/net/ip6_route.h:187:int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)); and net/ip6_route.h is already included. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22netfilter: Use IS_ERR_OR_NULL().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22ipv6: Use IS_ERR_OR_NULL().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== 1) The transport header did not point to the right place after esp/ah processing on tunnel mode in the receive path. As a result, the ECN field of the inner header was not set correctly, fixes from Li RongQing. 2) We did a null check too late in one of the xfrm_replay advance functions. This can lead to a division by zero, fix from Nickolai Zeldovich. 3) The size calculation of the hash table missed the muiltplication with the actual struct size when the hash table is freed. We might call the wrong free function, fix from Michal Kubecek. 4) On IPsec pmtu events we can't access the transport headers of the original packet, so force a relookup for all routes to notify about the pmtu event. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Do not try to update "updated" time if neighbour has already gone.YOSHIFUJI Hideaki / 吉藤英明
Commit 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().") introduce a bug to try to update "updated" time in neighbour structure. Update the "updated" time only if neighbour is available. Bug was found by Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21mcast: add multicast proxy support (IPv4 and IPv6)Nicolas Dichtel
This patch add the support of proxy multicast, ie being able to build a static multicast tree. It adds the support of (*,*) and (*,G) entries. The user should define an (*,*) entry which is not used for real forwarding. This entry defines the upstream in iif and contains all interfaces from the static tree in its oifs. It will be used to forward packet upstream when they come from an interface belonging to the static tree. Hence, the user should define (*,G) entries to build its static tree. Note that upstream interface must be part of oifs: packets are sent to all oifs interfaces except the input interface. This ensures to always join the whole static tree, even if the packet is not coming from the upstream interface. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Use compound literals to build redirect message.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Break down ndisc_build_skb() and build message directly.YOSHIFUJI Hideaki / 吉藤英明
Construct NS/NA/RS message directly using C99 compound literals. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Break down __ndisc_send().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Fill in ICMPv6 checksum and IPv6 header in ndisc_send_skb().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Use ndisc_send_skb() for redirect.YOSHIFUJI Hideaki / 吉藤英明
Reuse dst if one is attached with skb. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Remove icmp6h argument from ndisc_send_skb().YOSHIFUJI Hideaki / 吉藤英明
skb_transport_header() (thus icmp6_hdr()) is available here, use it. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Make ndisc_fill_xxx_option() for sk_buff.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Calculate message body length and option length separately.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Reset skb->trasport_headner inside ndisc_alloc_send_skb().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Defer building IPv6 header.YOSHIFUJI Hideaki / 吉藤英明
Build ICMPv6 message first and make buffer management easier; we can use skb->len when filling checksum in ICMPv6 header, and then build IP header with length field. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Remove dev argument for ndisc_send_skb().YOSHIFUJI Hideaki / 吉藤英明
Since we have skb->dev, use it. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Set skb->dev and skb->protocol inside ndisc_alloc_skb().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Simplify arguments for ip6_nd_hdr().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ipv6: Unshare ip6_nd_hdr() and change return type to void.YOSHIFUJI Hideaki / 吉藤英明
- move ip6_nd_hdr() to its users' source files. In net/ipv6/mcast.c, it will be called ip6_mc_hdr(). - make return type to void since this function never fails. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Introduce ndisc_alloc_skb() helper.YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Introduce ndisc_fill_redirect_hdr_option().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Use skb_linearize() instead of pskb_may_pull(skb, skb->len).YOSHIFUJI Hideaki / 吉藤英明
Suggested by Eric Dumazet <edumazet@google.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-21ndisc: Move ndisc_opt_addr_space() to include/net/ndisc.h.YOSHIFUJI Hideaki / 吉藤英明
This also makes ndisc_opt_addr_data() and ndisc_fill_addr_option() use ndisc_opt_addr_space(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>