Age | Commit message (Collapse) | Author |
|
Implement the ethtool commands that can be used to configure and query
flow-steering rules.
A large part of this change consists of translating the ethtool
representation of 'ntuples' to our internal gve_flow_rule and vice-versa
in the new created gve_flow_rule.c
Considering the possible large amount of flow rules, the driver doesn't
store all the rules locally. When the user runs 'ethtool -n <nic>' to
check the registered rules, the driver will send adminq command to
query a limited amount of rules/rule ids(that filled in a 4096 bytes dma
memory) at a time as a cache for the ethtool queries. The adminq query
commands will be repeated for several times until the ethtool has
queried all the needed rules.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-6-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new adminq commands for the driver to configure and query flow rules
that are stored in the device. Flow steering rules are assigned with a
location that determines the relative order of the rules.
Flow rules can run up to an order of millions. In such cases, storing
a full copy of the rules in the driver to prepare for the ethtool query
is infeasible while querying them from the device is better. That needs
to be optimized too so that we don't send a lot of adminq commands. The
solution here is to store a limited number of rules/rule ids in the
driver in a cache. Use dma_pool to allocate 4k bytes which lets device
write at most 46 flow rules(4096/88) or 1024 rule ids(4096/4) at a time.
For configuring flow rules, there are 3 sub-commands:
- ADD which adds a rule at the location supplied
- DEL which deletes the rule at the location supplied
- RESET which clears all currently active rules in the device
For querying flow rules, there are also 3 sub-commands:
- QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in
the allocated cache after querying the device
- QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the
rule_ids in the allocated cache after querying the device
- QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the
device's current flow rule number and the supported max flow rule
limit
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-5-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new device option to signal to the driver that the device supports
flow steering. This device option also carries the maximum number of
flow steering rules that the device can store.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-4-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We were depending on the rtnl_lock to make sure there is only one adminq
command running at a time. But some commands may take too long to hold
the rtnl_lock, such as the upcoming flow steering operations. For such
situations, it can temporarily drop the rtnl_lock, and replace it for
these operations with a new adminq lock, which can ensure the adminq
command execution to be thread-safe.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240625001232.1476315-2-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The new netdev queue api is implemented for gve.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/all/20240501232549.1327174-11-shailend@google.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Every tx and rx ring has its own queue-page-list (QPL) that serves as
the bounce buffer. Previously we were allocating QPLs for all queues
before the queues themselves were allocated and later associating a QPL
with a queue. This is avoidable complexity: it is much more natural for
each queue to allocate and free its own QPL.
Moreover, the advent of new queue-manipulating ndo hooks make it hard to
keep things as is: we would need to transfer a QPL from an old queue to
a new queue, and that is unpleasant.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to make possible the implementation of per-queue ndo hooks,
gve_turnup was changed in a previous patch to account for queues already
having some unprocessed descriptors: it does a one-off napi_schdule to
handle them. If conditions of consistent high traffic persist in the
immediate aftermath of this, the poll routine for a queue can be "stuck"
on the cpu on which the ndo hooks ran, instead of the cpu its irq has
affinity with.
This situation is exacerbated by the fact that the ndo hooks for all the
queues are invoked on the same cpu, potentially causing all the napi
poll routines to be residing on the same cpu.
A self correcting mechanism in the poll method itself solves this
problem.
Tested-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The qpl_cfg struct was used to make sure that no two different queues
are using QPL with the same qpl_id. We can remove that qpl_cfg struct
since now the qpl_ids map with the queues respectively as follows:
For tx queues: qpl_id = tx_qid
For rx queues: qpl_id = max_tx_queues + rx_qid
And when XDP is used, it will need the user to reduce the tx queues to
be at most half of the max_tx_queues. Then it will use the same number
of tx queues starting from the end of existing tx queues for XDP. So the
XDP queues will not exceed the max_tx_queues range and will not overlap
with the rx queues, where the qpl_ids will not have overlapping too.
Considering of that, we remove the qpl_cfg struct to get the qpl_id
directly based on the queue id. Unless we are erroneously allocating a
rx/tx queue that has already been allocated, we would never allocate
the qpl with the same qpl_id twice. In that case, it should fail much
earlier than the QPL assignment.
Suggested-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20240417205757.778551-1-ziweixiao@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow the user to change ring size via ethtool if
supported by the device. The driver relies on the
ring size ranges queried from device to validate
ring sizes requested by the user.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to read ring size change capability and the
min and max descriptor counts from the device and store it
in the driver. Also accommodate a special case where the
device does not provide minimum ring size depending on the
version of the device. In that case, rely on default values
for the minimums.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fulfill the requirement that for GQI, the number of pages per
RX QPL is equal to the ring size. Set this value to be equal to
ring size. Because of this change, the rx_data_slot_cnt and
rx_pages_per_qpl fields stored in the priv structure are not
needed, so remove their usage. And for DQO, the number of pages
per RX QPL is more than ring size to account for out-of-order
completions. So set it to two times of rx ring size.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the DQO queue format, the gve driver stores two ring sizes
for both TX and RX - one for completion queue ring and one for
data buffer ring. This is supposed to enable asymmetric sizes
for these two rings but that is not supported. Make both fields
reference the same single variable.
This change renders reading supported TX completion ring size
and RX buffer ring size for DQO from the device useless, so change
those fields to reserved and remove related code.
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To record the stats of header split packets, three stats are added in
the driver's ethtool stats.
- rx_hsplit_pkt is the split packets count with header split
- rx_hsplit_bytes is the received header bytes count with header split
- rx_hsplit_unsplit_pkt is the unsplit packet count due to header buffer
overflow or zero header length when header split is enabled
Currently, it's entering the stats_update critical section more than
once per packet. We have plans to avoid that in the future change to let
all the stats_update happen in one place at the end of
`gve_rx_poll_dqo`.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add header buffers and ethtool support to enable header split via the
tcp-data-split flag in ethtool's ringparam config. A coherent dma memory
is allocated for the header buffers. There is one header buffer per ring
entry by calculating the offset to the header-buffers starting address.
The header buffer is always copied directly into the skb and payload is
always added as frags. When there is a header buffer overflow or the
header length is 0, the driver places the whole unsplit packet in frags.
When toggling header split, the driver will call gve_adjust_config to
set its queues appropriately. If header split is enabled by the user and
the max packet buffer size is no less than 4KB, driver will set the
packet buffer size as 4KB to support TCP_ZEROCOPY_RECEIVE. Otherwise the
driver will use the default 2KB as the packet buffer size.
`ethtool -G <dev> tcp-data-split on/off` is the command to toggle header
split.
`ethtool -g <dev>` will show the status of header split with the field
of `tcp-data-split`.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To enable header split via ethtool, we first need to query the device to
get the max rx buffer size and header buffer size. Add a device option
to get these values and store them in the driver. If the header buffer
size received from the device is non-zero, it means header split is
supported in the device.
Currently the max rx buffer size will only be used when header split is
enabled which will set the data_buffer_size_dqo to be the max rx buffer
size. Also change the data_buffer_size_dqo from int to u16 since we are
modifying it and making it to be consistent with max_rx_buffer_size.
Co-developed-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The new config-aware functions will help achieve the goal of being able
to allocate resources for new queues while there already are active
queues serving traffic.
These new functions work off of arbitrary queue allocation configs
rather than just the currently active config in priv, and they return
the newly allocated resources instead of writing them into priv.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122182632.1102721-4-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change makes the napi poll functions non-static and moves the
gve_(add|remove)_napi functions to gve_utils.c, to make possible future
"start queue" hooks in the datapath files.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122182632.1102721-3-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Queue allocation functions currently can only allocate into priv and
free memory in priv. These new structs would be passed into the queue
functions in a subsequent change to make them capable of returning newly
allocated resources and not just writing them into priv. They also make
it possible to allocate resources for queues with a different config
than that of the currently active queues.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240122182632.1102721-2-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prior to this change, gve crashes when attempting to run in kernels with
page sizes other than 4k. This change removes unnecessary references to
PAGE_SIZE and replaces them with more meaningful constants.
Signed-off-by: Jordan Kimbrough <jrkim@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231128002648.320892-6-jfraker@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This allows the adminq to be smaller than a page, paving the way for
non 4k page support. This is to support platforms where PAGE_SIZE
is not 4k, such as some ARM platforms.
Signed-off-by: Jordan Kimbrough <jrkim@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231128002648.320892-2-jfraker@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RX path allocates the QPL page pool at queue creation, and
tries to reuse these pages through page recycling. This patch
ensures that on refill no non-QPL pages are posted to the device.
When the driver is running low on free buffers, an ondemand
allocation step kicks in that allocates a non-qpl page for
SKB business to free up the QPL page in use.
gve_try_recycle_buf was moved to gve_rx_append_frags so that driver does
not attempt to mark buffer as used if a non-qpl page was allocated
ondemand.
Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers.
When a packet needs to be transmitted, we break the packet into max
GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX
descriptor.
We allocate the TX buffers from the free list in dqo_tx.
We store these TX buffer indices in an array in the pending_packet
structure.
The TX buffers are returned to the free list in dqo_compl after
receiving packet completion or when removing packets from miss
completions list.
Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
GVE supports QPL ("queue-page-list") mode where
all data is communicated through a set of pre-registered
pages. Adding this mode to DQO descriptor format.
Add checks, abi-changes and device options to support
QPL mode for DQO in addition to GQI. Also, use
pages-per-qpl supplied by device-option to control the
size of the "queue-page-list".
Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Signed-off-by: Bailey Forrest <bcf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Current codebase contained the usage of two different names for this
driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users
to use, especially when trying to bind or unbind the driver manually.
The corresponding kernel module is registered with the name of `gve`.
It's more reasonable to align the name of the driver with the module.
Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Cc: csully@google.com
Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The two constants accomplish the same thing.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230407184830.309398-1-shailend@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Conflicts:
drivers/net/ethernet/google/gve/gve.h
3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/
https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/
Adjacent changes:
net/can/isotp.c
051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Non-GSO TCP packets whose SKBs' linear portion did not include the
entire TCP header were not populating the first Tx descriptor with
as many bytes as the vNIC expected. This change ensures that all
TCP packets populate the first descriptor with the correct number of
bytes.
Fixes: 893ce44df565 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Shailend Chand <shailend@google.com>
Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adding AF_XDP zero-copy support.
Note: Although these changes support AF_XDP socket in zero-copy
mode, there is still a copy happening within the driver between
XSK buffer pool and QPL bounce buffers in GQI-QPL format.
In GQI-QPL queue format, the driver needs to allocate a fixed size
memory, the size specified by vNIC device, for RX/TX and register this
memory as a bounce buffer with the vNIC device when a queue is
created. The number of pages in the bounce buffer is limited and the
pages need to be made available to the vNIC by copying the RX data out
to prevent head-of-line blocking. Therefore, we cannot pass the XSK
buffer pool to the vNIC.
The number of copies on RX path from the bounce buffer to XSK buffer is 2
for AF_XDP copy mode (bounce buffer -> allocated page frag -> XSK buffer)
and 1 for AF_XDP zero-copy mode (bounce buffer -> XSK buffer).
This patch contains the following changes:
1) Enable and disable XSK buffer pool
2) Copy XDP packets from QPL bounce buffers to XSK buffer on rx
3) Copy XDP packets from XSK buffer to QPL bounce buffers and
ring the doorbell as part of XDP TX napi poll
4) ndo_xsk_wakeup callback support
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch contains the following changes:
1) Support for XDP REDIRECT action on rx
2) ndo_xdp_xmit callback support
In GQI-QPL queue format, the driver needs to allocate a fixed size
memory, the size specified by vNIC device, for RX/TX and register this
memory as a bounce buffer with the vNIC device when a queue is created.
The number of pages in the bounce buffer is limited and the pages need to
be made available to the vNIC by copying the RX data out to prevent
head-of-line blocking. The XDP_REDIRECT packets are therefore immediately
copied to a newly allocated page.
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for XDP PASS, DROP and TX actions.
This patch contains the following changes:
1) Support installing/uninstalling XDP program
2) Add dedicated XDP TX queues
3) Add support for XDP DROP action
4) Add support for XDP TX action
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Changes to enable adding and removing TX queues without calling
gve_close() and gve_open().
Made the following changes:
1) priv->tx, priv->rx and priv->qpls arrays are allocated based on
max tx queues and max rx queues
2) Changed gve_adminq_create_tx_queues(), gve_adminq_destroy_tx_queues(),
gve_tx_alloc_rings() and gve_tx_free_rings() functions to add/remove a
subset of TX queues rather than all the TX queues.
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds/modifies helper functions needed to add XDP
support.
Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check whether the driver is compatible with the device
presented.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, even if just one of the many fragments of a 9k packet
required a copy, we'd copy the whole packet into a freshly-allocated
9k-sized linear SKB, and this led to performance issues.
By having a pool of pages to copy into, each fragment can be
independently handled, leading to a reduced incidence of
allocation and copy.
Signed-off-by: Shailend Chand <shailend@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use GFP_ATOMIC when allocating pages out of the hotpath,
continue to use GFP_KERNEL when allocating pages during setup.
GFP_KERNEL will allow blocking which allows it to succeed
more often in a low memory enviornment but in the hotpath we do
not want to allow the allocation to block.
Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Link: https://lore.kernel.org/r/20220126003843.3584521-1-awogbemila@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adding ethtool support for changing rx-coalesce-usec and tx-coalesce-usec
when using the DQO queue format.
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for suspend, resume and shutdown.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow drivers to pass metadata along with packet data to the device.
Introduce a new metadata descriptor type
* GVE_TXD_MTD
This descriptor is optional. If present it immediate follows the
packet descriptor and precedes the segment descriptor.
This descriptor may be repeated. Multiple metadata descriptors may
follow. There are no immediate uses for this, this is for future
proofing. At present devices allow only 1 MTD descriptor.
The lower four bits of the type_flags field encode GVE_TXD_MTD.
The upper four bits of the type_flags field encodes a *sub*type.
Introduce one such metadata descriptor subtype
* GVE_MTD_SUBTYPE_PATH
This shares path information with the device for network failure
discovery and robust response:
Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash,
and updates this field on error with sk_rethink_txhash. Allow the host
stack to do the same. Pass the tx_hash value if set. Also communicate
whether the path hash is set, or more exactly, what its type is. Define
two common types
GVE_MTD_PATH_HASH_NONE
GVE_MTD_PATH_HASH_L4
Concrete examples of error conditions that are resolved are
mentioned in the commits that add sk_rethink_txhash calls. Such as
commit 7788174e8726 ("tcp: change IPv6 flow-label upon receiving
spurious retransmission").
Experimental results mirror what the theory suggests: where IPv6
FlowLabel is included in path selection (e.g., LAG/ECMP), flowlabel
rotation on TCP timeout avoids the vast majority of TCP disconnects
that would otherwise have occurred during link failures in long-haul
backbones, when an alternative path is available.
Rotation can be applied to various bad connection signals, such as
timeouts and spurious retransmissions. In aggregate, such flow level
signals can help locate network issues. Define initial common states:
GVE_MTD_PATH_STATE_DEFAULT
GVE_MTD_PATH_STATE_TIMEOUT
GVE_MTD_PATH_STATE_CONGESTION
GVE_MTD_PATH_STATE_RETRANSMIT
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Giving the device access to other kernel structs is not ideal.
Move the indexes into their own array and just keep pointers to
them in the ntfy block struct.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This enables the driver to receive RX packets spread across multiple
buffers:
For a given multi-fragment packet the "packet continuation" bit is set
on all descriptors except the last one. These descriptors' payloads are
combined into a single SKB before the SKB is handed to the
networking stack.
This change adds a "packet buffer size" notion for RX queues. The
CreateRxQueue AdminQueue command sent to the device now includes the
packet_buffer_size.
We opt for a packet_buffer_size of PAGE_SIZE / 2 to give the
driver the opportunity to flip pages where we can instead of copying.
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This refactor moves the skb_head and skb_tail fields into a new
gve_rx_ctx struct. This new struct will contain information about the
current packet being processed. This is in preparation for
multi-descriptor RX packets.
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Don't always reset the driver on a TX timeout. Attempt to
recover by kicking the queue in case an IRQ was missed.
Fixes: 9e5f7d26a4c08 ("gve: Add workqueue and reset support")
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When TX queue is full, attemt to process enough TX completions
to avoid stalling the queue.
Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use napi_complete_done to allow for the use of gro_flush_timeout.
Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Yangchun Fu <yangchun@google.com>
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The qpl_map_size is rounded up to a multiple of sizeof(long), but the
number of qpls doesn't have to be.
Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The use of dma_unmap_addr()/dma_unmap_len() in the driver causes
multiple warnings when these macros are defined as empty, e.g.
in an ARCH=i386 allmodconfig build:
drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_tx_add_skb_no_copy_dqo':
drivers/net/ethernet/google/gve/gve_tx_dqo.c:494:40: error: unused variable 'buf' [-Werror=unused-variable]
494 | struct gve_tx_dma_buf *buf =
This is not how the NEED_DMA_MAP_STATE macros are meant to work,
as they rely on never using local variables or a temporary structure
like gve_tx_dma_buf.
Remote the gve_tx_dma_buf definition and open-code the contents
in all places to avoid the warning. This causes some rather long
lines but otherwise ends up making the driver slightly smaller.
Fixes: a57e5de476be ("gve: DQO: Add TX path")
Link: https://lore.kernel.org/netdev/20210723231957.1113800-1-bcf@google.com/
Link: https://lore.kernel.org/netdev/20210721151100.2042139-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The RX queue has an array of `gve_rx_buf_state_dqo` objects. All
allocated pages have an associated buf_state object. When a buffer is
posted on the RX buffer queue, the buffer ID will be the buf_state's
index into the RX queue's array.
On packet reception, the RX queue will have one descriptor for each
buffer associated with a received packet. Each RX descriptor will have
a buffer_id that was posted on the buffer queue.
Notable mentions:
- We use a default buffer size of 2048 bytes. Based on page size, we
may post separate sections of a single page as separate buffers.
- The driver holds an extra reference on pages passed up the receive
path with an skb and keeps these pages on a list. When posting new
buffers to the NIC, we check if any of these pages has only our
reference, or another buffer sized segment of the page has no
references. If so, it is free to reuse. This page recycling approach
is a common netdev optimization that reduces page alloc/free calls.
- Pages in the free list have a page_count bias in order to avoid an
atomic increment of pagecount every time we attempt to reuse a page.
# references = page_count() - bias
- In order to track when a page is safe to reuse, we keep track of the
last offset which had a single SKB reference. When this occurs, it
implies that every single other offset is reusable. Otherwise, we
don't know if offsets can be safely reused.
- We maintain two free lists of pages. List #1 (recycled_buf_states)
contains pages we know can be reused right away. List #2
(used_buf_states) contains pages which cannot be used right away. We
only attempt to get pages from list #2 when list #1 is empty. We only
attempt to use a small fixed number pages from list #2 before giving
up and allocating a new page. Both lists are FIFOs in hope that by the
time we attempt to reuse a page, the references were dropped.
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allocate the buffer and completion ring structures. Do not populate the
rings yet. That will happen in the respective rx and tx datapath
follow-on patches
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|