summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/google/gve/gve.h
AgeCommit message (Collapse)Author
2024-06-25gve: Add flow steering ethtool supportJeroen de Borst
Implement the ethtool commands that can be used to configure and query flow-steering rules. A large part of this change consists of translating the ethtool representation of 'ntuples' to our internal gve_flow_rule and vice-versa in the new created gve_flow_rule.c Considering the possible large amount of flow rules, the driver doesn't store all the rules locally. When the user runs 'ethtool -n <nic>' to check the registered rules, the driver will send adminq command to query a limited amount of rules/rule ids(that filled in a 4096 bytes dma memory) at a time as a cache for the ethtool queries. The adminq query commands will be repeated for several times until the ethtool has queried all the needed rules. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-6-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add flow steering adminq commandsJeroen de Borst
Add new adminq commands for the driver to configure and query flow rules that are stored in the device. Flow steering rules are assigned with a location that determines the relative order of the rules. Flow rules can run up to an order of millions. In such cases, storing a full copy of the rules in the driver to prepare for the ethtool query is infeasible while querying them from the device is better. That needs to be optimized too so that we don't send a lot of adminq commands. The solution here is to store a limited number of rules/rule ids in the driver in a cache. Use dma_pool to allocate 4k bytes which lets device write at most 46 flow rules(4096/88) or 1024 rule ids(4096/4) at a time. For configuring flow rules, there are 3 sub-commands: - ADD which adds a rule at the location supplied - DEL which deletes the rule at the location supplied - RESET which clears all currently active rules in the device For querying flow rules, there are also 3 sub-commands: - QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in the allocated cache after querying the device - QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the rule_ids in the allocated cache after querying the device - QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the device's current flow rule number and the supported max flow rule limit Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-5-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add flow steering device optionJeroen de Borst
Add a new device option to signal to the driver that the device supports flow steering. This device option also carries the maximum number of flow steering rules that the device can store. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-4-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25gve: Add adminq mutex lockZiwei Xiao
We were depending on the rtnl_lock to make sure there is only one adminq command running at a time. But some commands may take too long to hold the rtnl_lock, such as the upcoming flow steering operations. For such situations, it can temporarily drop the rtnl_lock, and replace it for these operations with a new adminq lock, which can ensure the adminq command execution to be thread-safe. Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-2-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-06gve: Implement queue apiShailend Chand
The new netdev queue api is implemented for gve. Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Shailend Chand <shailend@google.com> Link: https://lore.kernel.org/all/20240501232549.1327174-11-shailend@google.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-05gve: Alloc and free QPLs with the ringsShailend Chand
Every tx and rx ring has its own queue-page-list (QPL) that serves as the bounce buffer. Previously we were allocating QPLs for all queues before the queues themselves were allocated and later associating a QPL with a queue. This is avoidable complexity: it is much more natural for each queue to allocate and free its own QPL. Moreover, the advent of new queue-manipulating ndo hooks make it hard to keep things as is: we would need to transfer a QPL from an old queue to a new queue, and that is unpleasant. Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-05gve: Avoid rescheduling napi if on wrong cpuShailend Chand
In order to make possible the implementation of per-queue ndo hooks, gve_turnup was changed in a previous patch to account for queues already having some unprocessed descriptors: it does a one-off napi_schdule to handle them. If conditions of consistent high traffic persist in the immediate aftermath of this, the poll routine for a queue can be "stuck" on the cpu on which the ndo hooks ran, instead of the cpu its irq has affinity with. This situation is exacerbated by the fact that the ndo hooks for all the queues are invoked on the same cpu, potentially causing all the napi poll routines to be residing on the same cpu. A self correcting mechanism in the poll method itself solves this problem. Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-18gve: Remove qpl_cfg struct since qpl_ids map with queues respectivelyZiwei Xiao
The qpl_cfg struct was used to make sure that no two different queues are using QPL with the same qpl_id. We can remove that qpl_cfg struct since now the qpl_ids map with the queues respectively as follows: For tx queues: qpl_id = tx_qid For rx queues: qpl_id = max_tx_queues + rx_qid And when XDP is used, it will need the user to reduce the tx queues to be at most half of the max_tx_queues. Then it will use the same number of tx queues starting from the end of existing tx queues for XDP. So the XDP queues will not exceed the max_tx_queues range and will not overlap with the rx queues, where the qpl_ids will not have overlapping too. Considering of that, we remove the qpl_cfg struct to get the qpl_id directly based on the queue id. Unless we are erroneously allocating a rx/tx queue that has already been allocated, we would never allocate the qpl with the same qpl_id twice. In that case, it should fail much earlier than the QPL assignment. Suggested-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Shailend Chand <shailend@google.com> Link: https://lore.kernel.org/r/20240417205757.778551-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03gve: add support to change ring size via ethtoolHarshitha Ramamurthy
Allow the user to change ring size via ethtool if supported by the device. The driver relies on the ring size ranges queried from device to validate ring sizes requested by the user. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: add support to read ring size ranges from the deviceHarshitha Ramamurthy
Add support to read ring size change capability and the min and max descriptor counts from the device and store it in the driver. Also accommodate a special case where the device does not provide minimum ring size depending on the version of the device. In that case, rely on default values for the minimums. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: set page count for RX QPL for GQI and DQO queue formatsHarshitha Ramamurthy
Fulfill the requirement that for GQI, the number of pages per RX QPL is equal to the ring size. Set this value to be equal to ring size. Because of this change, the rx_data_slot_cnt and rx_pages_per_qpl fields stored in the priv structure are not needed, so remove their usage. And for DQO, the number of pages per RX QPL is more than ring size to account for out-of-order completions. So set it to two times of rx ring size. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: make the completion and buffer ring size equal for DQOHarshitha Ramamurthy
For the DQO queue format, the gve driver stores two ring sizes for both TX and RX - one for completion queue ring and one for data buffer ring. This is supposed to enable asymmetric sizes for these two rings but that is not supported. Make both fields reference the same single variable. This change renders reading supported TX completion ring size and RX buffer ring size for DQO from the device useless, so change those fields to reserved and remove related code. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04gve: Add header split ethtool statsJeroen de Borst
To record the stats of header split packets, three stats are added in the driver's ethtool stats. - rx_hsplit_pkt is the split packets count with header split - rx_hsplit_bytes is the received header bytes count with header split - rx_hsplit_unsplit_pkt is the unsplit packet count due to header buffer overflow or zero header length when header split is enabled Currently, it's entering the stats_update critical section more than once per packet. We have plans to avoid that in the future change to let all the stats_update happen in one place at the end of `gve_rx_poll_dqo`. Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04gve: Add header split data pathJeroen de Borst
Add header buffers and ethtool support to enable header split via the tcp-data-split flag in ethtool's ringparam config. A coherent dma memory is allocated for the header buffers. There is one header buffer per ring entry by calculating the offset to the header-buffers starting address. The header buffer is always copied directly into the skb and payload is always added as frags. When there is a header buffer overflow or the header length is 0, the driver places the whole unsplit packet in frags. When toggling header split, the driver will call gve_adjust_config to set its queues appropriately. If header split is enabled by the user and the max packet buffer size is no less than 4KB, driver will set the packet buffer size as 4KB to support TCP_ZEROCOPY_RECEIVE. Otherwise the driver will use the default 2KB as the packet buffer size. `ethtool -G <dev> tcp-data-split on/off` is the command to toggle header split. `ethtool -g <dev>` will show the status of header split with the field of `tcp-data-split`. Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04gve: Add header split device optionJeroen de Borst
To enable header split via ethtool, we first need to query the device to get the max rx buffer size and header buffer size. Add a device option to get these values and store them in the driver. If the header buffer size received from the device is non-zero, it means header split is supported in the device. Currently the max rx buffer size will only be used when header split is enabled which will set the data_buffer_size_dqo to be the max rx buffer size. Also change the data_buffer_size_dqo from int to u16 since we are modifying it and making it to be consistent with max_rx_buffer_size. Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-23gve: Switch to config-aware queue allocationShailend Chand
The new config-aware functions will help achieve the goal of being able to allocate resources for new queues while there already are active queues serving traffic. These new functions work off of arbitrary queue allocation configs rather than just the currently active config in priv, and they return the newly allocated resources instead of writing them into priv. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122182632.1102721-4-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23gve: Refactor napi add and remove functionsShailend Chand
This change makes the napi poll functions non-static and moves the gve_(add|remove)_napi functions to gve_utils.c, to make possible future "start queue" hooks in the datapath files. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122182632.1102721-3-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-23gve: Define config structs for queue allocationShailend Chand
Queue allocation functions currently can only allocate into priv and free memory in priv. These new structs would be passed into the queue functions in a subsequent change to make them capable of returning newly allocated resources and not just writing them into priv. They also make it possible to allocate resources for queues with a different config than that of the currently active queues. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122182632.1102721-2-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29gve: Remove dependency on 4k page size.John Fraker
Prior to this change, gve crashes when attempting to run in kernels with page sizes other than 4k. This change removes unnecessary references to PAGE_SIZE and replaces them with more meaningful constants. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-6-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29gve: Perform adminq allocations through a dma_pool.John Fraker
This allows the adminq to be smaller than a page, paving the way for non 4k page support. This is to support platforms where PAGE_SIZE is not 4k, such as some ARM platforms. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-2-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-06gve: RX path for DQO-QPLRushil Gupta
The RX path allocates the QPL page pool at queue creation, and tries to reuse these pages through page recycling. This patch ensures that on refill no non-QPL pages are posted to the device. When the driver is running low on free buffers, an ondemand allocation step kicks in that allocates a non-qpl page for SKB business to free up the QPL page in use. gve_try_recycle_buf was moved to gve_rx_append_frags so that driver does not attempt to mark buffer as used if a non-qpl page was allocated ondemand. Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06gve: Tx path for DQO-QPLRushil Gupta
Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers. When a packet needs to be transmitted, we break the packet into max GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX descriptor. We allocate the TX buffers from the free list in dqo_tx. We store these TX buffer indices in an array in the pending_packet structure. The TX buffers are returned to the free list in dqo_compl after receiving packet completion or when removing packets from miss completions list. Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-06gve: Control path for DQO-QPLRushil Gupta
GVE supports QPL ("queue-page-list") mode where all data is communicated through a set of pre-registered pages. Adding this mode to DQO descriptor format. Add checks, abi-changes and device options to support QPL mode for DQO in addition to GQI. Also, use pages-per-qpl supplied by device-option to control the size of the "queue-page-list". Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-03eth: add missing xdp.h includes in driversJakub Kicinski
Handful of drivers currently expect to get xdp.h by virtue of including netdevice.h. This will soon no longer be the case so add explicit includes. Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-07-10gve: unify driver name usageJunfeng Guo
Current codebase contained the usage of two different names for this driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users to use, especially when trying to bind or unbind the driver manually. The corresponding kernel module is registered with the name of `gve`. It's more reasonable to align the name of the driver with the module. Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC") Cc: csully@google.com Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-11gve: Unify duplicate GQ min pkt desc size constantsShailend Chand
The two constants accomplish the same thing. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230407184830.309398-1-shailend@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/google/gve/gve.h 3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts") 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format") https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/ https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/ Adjacent changes: net/can/isotp.c 051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") 96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-04gve: Secure enough bytes in the first TX desc for all TCP pktsShailend Chand
Non-GSO TCP packets whose SKBs' linear portion did not include the entire TCP header were not populating the first Tx descriptor with as many bytes as the vNIC expected. This change ensures that all TCP packets populate the first descriptor with the correct number of bytes. Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: Shailend Chand <shailend@google.com> Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-17gve: Add AF_XDP zero-copy support for GQI-QPL formatPraveen Kaligineedi
Adding AF_XDP zero-copy support. Note: Although these changes support AF_XDP socket in zero-copy mode, there is still a copy happening within the driver between XSK buffer pool and QPL bounce buffers in GQI-QPL format. In GQI-QPL queue format, the driver needs to allocate a fixed size memory, the size specified by vNIC device, for RX/TX and register this memory as a bounce buffer with the vNIC device when a queue is created. The number of pages in the bounce buffer is limited and the pages need to be made available to the vNIC by copying the RX data out to prevent head-of-line blocking. Therefore, we cannot pass the XSK buffer pool to the vNIC. The number of copies on RX path from the bounce buffer to XSK buffer is 2 for AF_XDP copy mode (bounce buffer -> allocated page frag -> XSK buffer) and 1 for AF_XDP zero-copy mode (bounce buffer -> XSK buffer). This patch contains the following changes: 1) Enable and disable XSK buffer pool 2) Copy XDP packets from QPL bounce buffers to XSK buffer on rx 3) Copy XDP packets from XSK buffer to QPL bounce buffers and ring the doorbell as part of XDP TX napi poll 4) ndo_xsk_wakeup callback support Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17gve: Add XDP REDIRECT support for GQI-QPL formatPraveen Kaligineedi
This patch contains the following changes: 1) Support for XDP REDIRECT action on rx 2) ndo_xdp_xmit callback support In GQI-QPL queue format, the driver needs to allocate a fixed size memory, the size specified by vNIC device, for RX/TX and register this memory as a bounce buffer with the vNIC device when a queue is created. The number of pages in the bounce buffer is limited and the pages need to be made available to the vNIC by copying the RX data out to prevent head-of-line blocking. The XDP_REDIRECT packets are therefore immediately copied to a newly allocated page. Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17gve: Add XDP DROP and TX support for GQI-QPL formatPraveen Kaligineedi
Add support for XDP PASS, DROP and TX actions. This patch contains the following changes: 1) Support installing/uninstalling XDP program 2) Add dedicated XDP TX queues 3) Add support for XDP DROP action 4) Add support for XDP TX action Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17gve: Changes to add new TX queuesPraveen Kaligineedi
Changes to enable adding and removing TX queues without calling gve_close() and gve_open(). Made the following changes: 1) priv->tx, priv->rx and priv->qpls arrays are allocated based on max tx queues and max rx queues 2) Changed gve_adminq_create_tx_queues(), gve_adminq_destroy_tx_queues(), gve_tx_alloc_rings() and gve_tx_free_rings() functions to add/remove a subset of TX queues rather than all the TX queues. Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-17gve: XDP support GQI-QPL: helper function changesPraveen Kaligineedi
This patch adds/modifies helper functions needed to add XDP support. Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21gve: Adding a new AdminQ command to verify driverJeroen de Borst
Check whether the driver is compatible with the device presented. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-02gve: Reduce alloc and copy costs in the GQ rx pathShailend Chand
Previously, even if just one of the many fragments of a 9k packet required a copy, we'd copy the whole packet into a freshly-allocated 9k-sized linear SKB, and this led to performance issues. By having a pool of pages to copy into, each fragment can be independently handled, leading to a reduced incidence of allocation and copy. Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-26gve: Fix GFP flags when allocing pagesCatherine Sullivan
Use GFP_ATOMIC when allocating pages out of the hotpath, continue to use GFP_KERNEL when allocating pages during setup. GFP_KERNEL will allow blocking which allows it to succeed more often in a low memory enviornment but in the hotpath we do not want to allow the allocation to block. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Link: https://lore.kernel.org/r/20220126003843.3584521-1-awogbemila@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16gve: Add tx|rx-coalesce-usec for DQOTao Liu
Adding ethtool support for changing rx-coalesce-usec and tx-coalesce-usec when using the DQO queue format. Signed-off-by: Tao Liu <xliutaox@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16gve: Implement suspend/resume/shutdownCatherine Sullivan
Add support for suspend, resume and shutdown. Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16gve: Add optional metadata descriptor type GVE_TXD_MTDWillem de Bruijn
Allow drivers to pass metadata along with packet data to the device. Introduce a new metadata descriptor type * GVE_TXD_MTD This descriptor is optional. If present it immediate follows the packet descriptor and precedes the segment descriptor. This descriptor may be repeated. Multiple metadata descriptors may follow. There are no immediate uses for this, this is for future proofing. At present devices allow only 1 MTD descriptor. The lower four bits of the type_flags field encode GVE_TXD_MTD. The upper four bits of the type_flags field encodes a *sub*type. Introduce one such metadata descriptor subtype * GVE_MTD_SUBTYPE_PATH This shares path information with the device for network failure discovery and robust response: Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash, and updates this field on error with sk_rethink_txhash. Allow the host stack to do the same. Pass the tx_hash value if set. Also communicate whether the path hash is set, or more exactly, what its type is. Define two common types GVE_MTD_PATH_HASH_NONE GVE_MTD_PATH_HASH_L4 Concrete examples of error conditions that are resolved are mentioned in the commits that add sk_rethink_txhash calls. Such as commit 7788174e8726 ("tcp: change IPv6 flow-label upon receiving spurious retransmission"). Experimental results mirror what the theory suggests: where IPv6 FlowLabel is included in path selection (e.g., LAG/ECMP), flowlabel rotation on TCP timeout avoids the vast majority of TCP disconnects that would otherwise have occurred during link failures in long-haul backbones, when an alternative path is available. Rotation can be applied to various bad connection signals, such as timeouts and spurious retransmissions. In aggregate, such flow level signals can help locate network issues. Define initial common states: GVE_MTD_PATH_STATE_DEFAULT GVE_MTD_PATH_STATE_TIMEOUT GVE_MTD_PATH_STATE_CONGESTION GVE_MTD_PATH_STATE_RETRANSMIT Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16gve: Move the irq db indexes out of the ntfy block structCatherine Sullivan
Giving the device access to other kernel structs is not ideal. Move the indexes into their own array and just keep pointers to them in the ntfy block struct. Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25gve: Implement packet continuation for RX.David Awogbemila
This enables the driver to receive RX packets spread across multiple buffers: For a given multi-fragment packet the "packet continuation" bit is set on all descriptors except the last one. These descriptors' payloads are combined into a single SKB before the SKB is handed to the networking stack. This change adds a "packet buffer size" notion for RX queues. The CreateRxQueue AdminQueue command sent to the device now includes the packet_buffer_size. We opt for a packet_buffer_size of PAGE_SIZE / 2 to give the driver the opportunity to flip pages where we can instead of copying. Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25gve: Add RX context.David Awogbemila
This refactor moves the skb_head and skb_tail fields into a new gve_rx_ctx struct. This new struct will contain information about the current packet being processed. This is in preparation for multi-descriptor RX packets. Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-11gve: Recover from queue stall due to missed IRQJohn Fraker
Don't always reset the driver on a TX timeout. Attempt to recover by kicking the queue in case an IRQ was missed. Fixes: 9e5f7d26a4c08 ("gve: Add workqueue and reset support") Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-11gve: Do lazy cleanup in TX pathTao Liu
When TX queue is full, attemt to process enough TX completions to avoid stalling the queue. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Tao Liu <xliutaox@google.com> Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-11gve: Switch to use napi_complete_doneYangchun Fu
Use napi_complete_done to allow for the use of gro_flush_timeout. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Yangchun Fu <yangchun@google.com> Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-06gve: Correct available tx qpl checkCatherine Sullivan
The qpl_map_size is rounded up to a multiple of sizeof(long), but the number of qpls doesn't have to be. Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support") Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28gve: DQO: avoid unused variable warningsArnd Bergmann
The use of dma_unmap_addr()/dma_unmap_len() in the driver causes multiple warnings when these macros are defined as empty, e.g. in an ARCH=i386 allmodconfig build: drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_tx_add_skb_no_copy_dqo': drivers/net/ethernet/google/gve/gve_tx_dqo.c:494:40: error: unused variable 'buf' [-Werror=unused-variable] 494 | struct gve_tx_dma_buf *buf = This is not how the NEED_DMA_MAP_STATE macros are meant to work, as they rely on never using local variables or a temporary structure like gve_tx_dma_buf. Remote the gve_tx_dma_buf definition and open-code the contents in all places to avoid the warning. This causes some rather long lines but otherwise ends up making the driver slightly smaller. Fixes: a57e5de476be ("gve: DQO: Add TX path") Link: https://lore.kernel.org/netdev/20210723231957.1113800-1-bcf@google.com/ Link: https://lore.kernel.org/netdev/20210721151100.2042139-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24gve: DQO: Add RX pathBailey Forrest
The RX queue has an array of `gve_rx_buf_state_dqo` objects. All allocated pages have an associated buf_state object. When a buffer is posted on the RX buffer queue, the buffer ID will be the buf_state's index into the RX queue's array. On packet reception, the RX queue will have one descriptor for each buffer associated with a received packet. Each RX descriptor will have a buffer_id that was posted on the buffer queue. Notable mentions: - We use a default buffer size of 2048 bytes. Based on page size, we may post separate sections of a single page as separate buffers. - The driver holds an extra reference on pages passed up the receive path with an skb and keeps these pages on a list. When posting new buffers to the NIC, we check if any of these pages has only our reference, or another buffer sized segment of the page has no references. If so, it is free to reuse. This page recycling approach is a common netdev optimization that reduces page alloc/free calls. - Pages in the free list have a page_count bias in order to avoid an atomic increment of pagecount every time we attempt to reuse a page. # references = page_count() - bias - In order to track when a page is safe to reuse, we keep track of the last offset which had a single SKB reference. When this occurs, it implies that every single other offset is reusable. Otherwise, we don't know if offsets can be safely reused. - We maintain two free lists of pages. List #1 (recycled_buf_states) contains pages we know can be reused right away. List #2 (used_buf_states) contains pages which cannot be used right away. We only attempt to get pages from list #2 when list #1 is empty. We only attempt to use a small fixed number pages from list #2 before giving up and allocating a new page. Both lists are FIFOs in hope that by the time we attempt to reuse a page, the references were dropped. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24gve: DQO: Add ring allocation and initializationBailey Forrest
Allocate the buffer and completion ring structures. Do not populate the rings yet. That will happen in the respective rx and tx datapath follow-on patches Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>