summaryrefslogtreecommitdiff
path: root/Documentation/bpf/map_lpm_trie.rst
blob: f9cd579496c9ce9cdb5160e9d2bf3dcbf55e62a4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2022 Red Hat, Inc.

=====================
BPF_MAP_TYPE_LPM_TRIE
=====================

.. note::
   - ``BPF_MAP_TYPE_LPM_TRIE`` was introduced in kernel version 4.11

``BPF_MAP_TYPE_LPM_TRIE`` provides a longest prefix match algorithm that
can be used to match IP addresses to a stored set of prefixes.
Internally, data is stored in an unbalanced trie of nodes that uses
``prefixlen,data`` pairs as its keys. The ``data`` is interpreted in
network byte order, i.e. big endian, so ``data[0]`` stores the most
significant byte.

LPM tries may be created with a maximum prefix length that is a multiple
of 8, in the range from 8 to 2048. The key used for lookup and update
operations is a ``struct bpf_lpm_trie_key_u8``, extended by
``max_prefixlen/8`` bytes.

- For IPv4 addresses the data length is 4 bytes
- For IPv6 addresses the data length is 16 bytes

The value type stored in the LPM trie can be any user defined type.

.. note::
   When creating a map of type ``BPF_MAP_TYPE_LPM_TRIE`` you must set the
   ``BPF_F_NO_PREALLOC`` flag.

Usage
=====

Kernel BPF
----------

bpf_map_lookup_elem()
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: c

   void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)

The longest prefix entry for a given data value can be found using the
``bpf_map_lookup_elem()`` helper. This helper returns a pointer to the
value associated with the longest matching ``key``, or ``NULL`` if no
entry was found.

The ``key`` should have ``prefixlen`` set to ``max_prefixlen`` when
performing longest prefix lookups. For example, when searching for the
longest prefix match for an IPv4 address, ``prefixlen`` should be set to
``32``.

bpf_map_update_elem()
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: c

   long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)

Prefix entries can be added or updated using the ``bpf_map_update_elem()``
helper. This helper replaces existing elements atomically.

``bpf_map_update_elem()`` returns ``0`` on success, or negative error in
case of failure.

 .. note::
    The flags parameter must be one of BPF_ANY, BPF_NOEXIST or BPF_EXIST,
    but the value is ignored, giving BPF_ANY semantics.

bpf_map_delete_elem()
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: c

   long bpf_map_delete_elem(struct bpf_map *map, const void *key)

Prefix entries can be deleted using the ``bpf_map_delete_elem()``
helper. This helper will return 0 on success, or negative error in case
of failure.

Userspace
---------

Access from userspace uses libbpf APIs with the same names as above, with
the map identified by ``fd``.

bpf_map_get_next_key()
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: c

   int bpf_map_get_next_key (int fd, const void *cur_key, void *next_key)

A userspace program can iterate through the entries in an LPM trie using
libbpf's ``bpf_map_get_next_key()`` function. The first key can be
fetched by calling ``bpf_map_get_next_key()`` with ``cur_key`` set to
``NULL``. Subsequent calls will fetch the next key that follows the
current key. ``bpf_map_get_next_key()`` returns ``0`` on success,
``-ENOENT`` if ``cur_key`` is the last key in the trie, or negative
error in case of failure.

``bpf_map_get_next_key()`` will iterate through the LPM trie elements
from leftmost leaf first. This means that iteration will return more
specific keys before less specific ones.

Examples
========

Please see ``tools/testing/selftests/bpf/test_lpm_map.c`` for examples
of LPM trie usage from userspace. The code snippets below demonstrate
API usage.

Kernel BPF
----------

The following BPF code snippet shows how to declare a new LPM trie for IPv4
address prefixes:

.. code-block:: c

    #include <linux/bpf.h>
    #include <bpf/bpf_helpers.h>

    struct ipv4_lpm_key {
            __u32 prefixlen;
            __u32 data;
    };

    struct {
            __uint(type, BPF_MAP_TYPE_LPM_TRIE);
            __type(key, struct ipv4_lpm_key);
            __type(value, __u32);
            __uint(map_flags, BPF_F_NO_PREALLOC);
            __uint(max_entries, 255);
    } ipv4_lpm_map SEC(".maps");

The following BPF code snippet shows how to lookup by IPv4 address:

.. code-block:: c

    void *lookup(__u32 ipaddr)
    {
            struct ipv4_lpm_key key = {
                    .prefixlen = 32,
                    .data = ipaddr
            };

            return bpf_map_lookup_elem(&ipv4_lpm_map, &key);
    }

Userspace
---------

The following snippet shows how to insert an IPv4 prefix entry into an
LPM trie:

.. code-block:: c

    int add_prefix_entry(int lpm_fd, __u32 addr, __u32 prefixlen, struct value *value)
    {
            struct ipv4_lpm_key ipv4_key = {
                    .prefixlen = prefixlen,
                    .data = addr
            };
            return bpf_map_update_elem(lpm_fd, &ipv4_key, value, BPF_ANY);
    }

The following snippet shows a userspace program walking through the entries
of an LPM trie:


.. code-block:: c

    #include <bpf/libbpf.h>
    #include <bpf/bpf.h>

    void iterate_lpm_trie(int map_fd)
    {
            struct ipv4_lpm_key *cur_key = NULL;
            struct ipv4_lpm_key next_key;
            struct value value;
            int err;

            for (;;) {
                    err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
                    if (err)
                            break;

                    bpf_map_lookup_elem(map_fd, &next_key, &value);

                    /* Use key and value here */

                    cur_key = &next_key;
            }
    }