diff options
21 files changed, 4322 insertions, 1814 deletions
diff --git a/Documentation/trace/histogram-design.rst b/Documentation/trace/histogram-design.rst new file mode 100644 index 000000000000..eef840043da9 --- /dev/null +++ b/Documentation/trace/histogram-design.rst @@ -0,0 +1,2115 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================== +Histogram Design Notes +====================== + +:Author: Tom Zanussi <zanussi@kernel.org> + +This document attempts to provide a description of how the ftrace +histograms work and how the individual pieces map to the data +structures used to implement them in trace_events_hist.c and +tracing_map.c. + +Note: All the ftrace histogram command examples assume the working + directory is the ftrace /tracing directory. For example:: + + # cd /sys/kernel/debug/tracing + +Also, the histogram output displayed for those commands will be +generally be truncated - only enough to make the point is displayed. + +'hist_debug' trace event files +============================== + +If the kernel is compiled with CONFIG_HIST_TRIGGERS_DEBUG set, an +event file named 'hist_debug' will appear in each event's +subdirectory. This file can be read at any time and will display some +of the hist trigger internals described in this document. Specific +examples and output will be described in test cases below. + +Basic histograms +================ + +First, basic histograms. Below is pretty much the simplest thing you +can do with histograms - create one with a single key on a single +event and cat the output:: + + # echo 'hist:keys=pid' >> events/sched/sched_waking/trigger + + # cat events/sched/sched_waking/hist + + { pid: 18249 } hitcount: 1 + { pid: 13399 } hitcount: 1 + { pid: 17973 } hitcount: 1 + { pid: 12572 } hitcount: 1 + ... + { pid: 10 } hitcount: 921 + { pid: 18255 } hitcount: 1444 + { pid: 25526 } hitcount: 2055 + { pid: 5257 } hitcount: 2055 + { pid: 27367 } hitcount: 2055 + { pid: 1728 } hitcount: 2161 + + Totals: + Hits: 21305 + Entries: 183 + Dropped: 0 + +What this does is create a histogram on the sched_waking event using +pid as a key and with a single value, hitcount, which even if not +explicitly specified, exists for every histogram regardless. + +The hitcount value is a per-bucket value that's automatically +incremented on every hit for the given key, which in this case is the +pid. + +So in this histogram, there's a separate bucket for each pid, and each +bucket contains a value for that bucket, counting the number of times +sched_waking was called for that pid. + +Each histogram is represented by a hist_data struct. + +To keep track of each key and value field in the histogram, hist_data +keeps an array of these fields named fields[]. The fields[] array is +an array containing struct hist_field representations of each +histogram val and key in the histogram (variables are also included +here, but are discussed later). So for the above histogram we have one +key and one value; in this case the one value is the hitcount value, +which all histograms have, regardless of whether they define that +value or not, which the above histogram does not. + +Each struct hist_field contains a pointer to the ftrace_event_field +from the event's trace_event_file along with various bits related to +that such as the size, offset, type, and a hist_field_fn_t function, +which is used to grab the field's data from the ftrace event buffer +(in most cases - some hist_fields such as hitcount don't directly map +to an event field in the trace buffer - in these cases the function +implementation gets its value from somewhere else). The flags field +indicates which type of field it is - key, value, variable, variable +reference, etc., with value being the default. + +The other important hist_data data structure in addition to the +fields[] array is the tracing_map instance created for the histogram, +which is held in the .map member. The tracing_map implements the +lock-free hash table used to implement histograms (see +kernel/trace/tracing_map.h for much more discussion about the +low-level data structures implementing the tracing_map). For the +purposes of this discussion, the tracing_map contains a number of +buckets, each bucket corresponding to a particular tracing_map_elt +object hashed by a given histogram key. + +Below is a diagram the first part of which describes the hist_data and +associated key and value fields for the histogram described above. As +you can see, there are two fields in the fields array, one val field +for the hitcount and one key field for the pid key. + +Below that is a diagram of a run-time snapshot of what the tracing_map +might look like for a given run. It attempts to show the +relationships between the hist_data fields and the tracing_map +elements for a couple hypothetical keys and values.:: + + +------------------+ + | hist_data | + +------------------+ +----------------+ + | .fields[] |---->| val = hitcount |----------------------------+ + +----------------+ +----------------+ | + | .map | | .size | | + +----------------+ +--------------+ | + | .offset | | + +--------------+ | + | .fn() | | + +--------------+ | + . | + . | + . | + +----------------+ <--- n_vals | + | key = pid |----------------------------|--+ + +----------------+ | | + | .size | | | + +--------------+ | | + | .offset | | | + +--------------+ | | + | .fn() | | | + +----------------+ <--- n_fields | | + | unused | | | + +----------------+ | | + | | | | + +--------------+ | | + | | | | + +--------------+ | | + | | | | + +--------------+ | | + n_keys = n_fields - n_vals | | + +The hist_data n_vals and n_fields delineate the extent of the fields[] | | +array and separate keys from values for the rest of the code. | | + +Below is a run-time representation of the tracing_map part of the | | +histogram, with pointers from various parts of the fields[] array | | +to corresponding parts of the tracing_map. | | + +The tracing_map consists of an array of tracing_map_entrys and a set | | +of preallocated tracing_map_elts (abbreviated below as map_entry and | | +map_elt). The total number of map_entrys in the hist_data.map array = | | +map->max_elts (actually map->map_size but only max_elts of those are | | +used. This is a property required by the map_insert() algorithm). | | + +If a map_entry is unused, meaning no key has yet hashed into it, its | | +.key value is 0 and its .val pointer is NULL. Once a map_entry has | | +been claimed, the .key value contains the key's hash value and the | | +.val member points to a map_elt containing the full key and an entry | | +for each key or value in the map_elt.fields[] array. There is an | | +entry in the map_elt.fields[] array corresponding to each hist_field | | +in the histogram, and this is where the continually aggregated sums | | +corresponding to each histogram value are kept. | | + +The diagram attempts to show the relationship between the | | +hist_data.fields[] and the map_elt.fields[] with the links drawn | | +between diagrams:: + + +-----------+ | | + | hist_data | | | + +-----------+ | | + | .fields | | | + +---------+ +-----------+ | | + | .map |---->| map_entry | | | + +---------+ +-----------+ | | + | .key |---> 0 | | + +---------+ | | + | .val |---> NULL | | + +-----------+ | | + | map_entry | | | + +-----------+ | | + | .key |---> pid = 999 | | + +---------+ +-----------+ | | + | .val |--->| map_elt | | | + +---------+ +-----------+ | | + . | .key |---> full key * | | + . +---------+ +---------------+ | | + . | .fields |--->| .sum (val) |<-+ | + +-----------+ +---------+ | 2345 | | | + | map_entry | +---------------+ | | + +-----------+ | .offset (key) |<----+ + | .key |---> 0 | 0 | | | + +---------+ +---------------+ | | + | .val |---> NULL . | | + +-----------+ . | | + | map_entry | . | | + +-----------+ +---------------+ | | + | .key | | .sum (val) or | | | + +---------+ +---------+ | .offset (key) | | | + | .val |--->| map_elt | +---------------+ | | + +-----------+ +---------+ | .sum (val) or | | | + | map_entry | | .offset (key) | | | + +-----------+ +---------------+ | | + | .key |---> pid = 4444 | | + +---------+ +-----------+ | | + | .val | | map_elt | | | + +---------+ +-----------+ | | + | .key |---> full key * | | + +---------+ +---------------+ | | + | .fields |--->| .sum (val) |<-+ | + +---------+ | 65523 | | + +---------------+ | + | .offset (key) |<----+ + | 0 | + +---------------+ + . + . + . + +---------------+ + | .sum (val) or | + | .offset (key) | + +---------------+ + | .sum (val) or | + | .offset (key) | + +---------------+ + +Abbreviations used in the diagrams:: + + hist_data = struct hist_trigger_data + hist_data.fields = struct hist_field + fn = hist_field_fn_t + map_entry = struct tracing_map_entry + map_elt = struct tracing_map_elt + map_elt.fields = struct tracing_map_field + +Whenever a new event occurs and it has a hist trigger associated with +it, event_hist_trigger() is called. event_hist_trigger() first deals +with the key: for each subkey in the key (in the above example, there +is just one subkey corresponding to pid), the hist_field that +represents that subkey is retrieved from hist_data.fields[] and the +hist_field_fn_t fn() associated with that field, along with the +field's size and offset, is used to grab that subkey's data from the +current trace record. + +Once the complete key has been retrieved, it's used to look that key +up in the tracing_map. If there's no tracing_map_elt associated with +that key, an empty one is claimed and inserted in the map for the new +key. In either case, the tracing_map_elt associated with that key is +returned. + +Once a tracing_map_elt available, hist_trigger_elt_update() is called. +As the name implies, this updates the element, which basically means +updating the element's fields. There's a tracing_map_field associated +with each key and value in the histogram, and each of these correspond +to the key and value hist_fields created when the histogram was +created. hist_trigger_elt_update() goes through each value hist_field +and, as for the keys, uses the hist_field's fn() and size and offset +to grab the field's value from the current trace record. Once it has +that value, it simply adds that value to that field's +continually-updated tracing_map_field.sum member. Some hist_field +fn()s, such as for the hitcount, don't actually grab anything from the +trace record (the hitcount fn() just increments the counter sum by 1), +but the idea is the same. + +Once all the values have been updated, hist_trigger_elt_update() is +done and returns. Note that there are also tracing_map_fields for +each subkey in the key, but hist_trigger_elt_update() doesn't look at +them or update anything - those exist only for sorting, which can +happen later. + +Basic histogram test +-------------------- + +This is a good example to try. It produces 3 value fields and 2 key +fields in the output:: + + # echo 'hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger + +To see the debug data, cat the kmem/kmalloc's 'hist_debug' file. It +will show the trigger info of the histogram it corresponds to, along +with the address of the hist_data associated with the histogram, which +will become useful in later examples. It then displays the number of +total hist_fields associated with the histogram along with a count of +how many of those correspond to keys and how many correspond to values. + +It then goes on to display details for each field, including the +field's flags and the position of each field in the hist_data's +fields[] array, which is useful information for verifying that things +internally appear correct or not, and which again will become even +more useful in further examples:: + + # cat events/kmem/kmalloc/hist_debug + + # event histogram + # + # trigger info: hist:keys=common_pid,call_site.sym:vals=hitcount,bytes_req,bytes_alloc:sort=hitcount:size=2048 [active] + # + + hist_data: 000000005e48c9a5 + + n_vals: 3 + n_keys: 2 + n_fields: 5 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + VAL: normal u64 value + ftrace_event_field name: bytes_req + type: size_t + size: 8 + is_signed: 0 + + hist_data->fields[2]: + flags: + VAL: normal u64 value + ftrace_event_field name: bytes_alloc + type: size_t + size: 8 + is_signed: 0 + + key fields: + + hist_data->fields[3]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: common_pid + type: int + size: 8 + is_signed: 1 + + hist_data->fields[4]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: call_site + type: unsigned long + size: 8 + is_signed: 0 + +The commands below can be used to clean things up for the next test:: + + # echo '!hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger + +Variables +========= + +Variables allow data from one hist trigger to be saved by one hist +trigger and retrieved by another hist trigger. For example, a trigger +on the sched_waking event can capture a timestamp for a particular +pid, and later a sched_switch event that switches to that pid event +can grab the timestamp and use it to calculate a time delta between +the two events:: + + # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> + events/sched/sched_waking/trigger + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> + events/sched/sched_switch/trigger + +In terms of the histogram data structures, variables are implemented +as another type of hist_field and for a given hist trigger are added +to the hist_data.fields[] array just after all the val fields. To +distinguish them from the existing key and val fields, they're given a +new flag type, HIST_FIELD_FL_VAR (abbreviated FL_VAR) and they also +make use of a new .var.idx field member in struct hist_field, which +maps them to an index in a new map_elt.vars[] array added to the +map_elt specifically designed to store and retrieve variable values. +The diagram below shows those new elements and adds a new variable +entry, ts0, corresponding to the ts0 variable in the sched_waking +trigger above. + +sched_waking histogram +----------------------:: + + +------------------+ + | hist_data |<-------------------------------------------------------+ + +------------------+ +-------------------+ | + | .fields[] |-->| val = hitcount | | + +----------------+ +-------------------+ | + | .map | | .size | | + +----------------+ +-----------------+ | + | .offset | | + +-----------------+ | + | .fn() | | + +-----------------+ | + | .flags | | + +-----------------+ | + | .var.idx | | + +-------------------+ | + | var = ts0 | | + +-------------------+ | + | .size | | + +-----------------+ | + | .offset | | + +-----------------+ | + | .fn() | | + +-----------------+ | + | .flags & FL_VAR | | + +-----------------+ | + | .var.idx |----------------------------+-+ | + +-----------------+ | | | + . | | | + . | | | + . | | | + +-------------------+ <--- n_vals | | | + | key = pid | | | | + +-------------------+ | | | + | .size | | | | + +-----------------+ | | | + | .offset | | | | + +-----------------+ | | | + | .fn() | | | | + +-----------------+ | | | + | .flags & FL_KEY | | | | + +-----------------+ | | | + | .var.idx | | | | + +-------------------+ <--- n_fields | | | + | unused | | | | + +-------------------+ | | | + | | | | | + +-----------------+ | | | + | | | | | + +-----------------+ | | | + | | | | | + +-----------------+ | | | + | | | | | + +-----------------+ | | | + | | | | | + +-----------------+ | | | + n_keys = n_fields - n_vals | | | + | | | + +This is very similar to the basic case. In the above diagram, we can | | | +see a new .flags member has been added to the struct hist_field | | | +struct, and a new entry added to hist_data.fields representing the ts0 | | | +variable. For a normal val hist_field, .flags is just 0 (modulo | | | +modifier flags), but if the value is defined as a variable, the .flags | | | +contains a set FL_VAR bit. | | | + +As you can see, the ts0 entry's .var.idx member contains the index | | | +into the tracing_map_elts' .vars[] array containing variable values. | | | +This idx is used whenever the value of the variable is set or read. | | | +The map_elt.vars idx assigned to the given variable is assigned and | | | +saved in .var.idx by create_tracing_map_fields() after it calls | | | +tracing_map_add_var(). | | | + +Below is a representation of the histogram at run-time, which | | | +populates the map, along with correspondence to the above hist_data and | | | +hist_field data structures. | | | + +The diagram attempts to show the relationship between the | | | +hist_data.fields[] and the map_elt.fields[] and map_elt.vars[] with | | | +the links drawn between diagrams. For each of the map_elts, you can | | | +see that the .fields[] members point to the .sum or .offset of a key | | | +or val and the .vars[] members point to the value of a variable. The | | | +arrows between the two diagrams show the linkages between those | | | +tracing_map members and the field definitions in the corresponding | | | +hist_data fields[] members.:: + + +-----------+ | | | + | hist_data | | | | + +-----------+ | | | + | .fields | | | | + +---------+ +-----------+ | | | + | .map |---->| map_entry | | | | + +---------+ +-----------+ | | | + | .key |---> 0 | | | + +---------+ | | | + | .val |---> NULL | | | + +-----------+ | | | + | map_entry | | | | + +-----------+ | | | + | .key |---> pid = 999 | | | + +---------+ +-----------+ | | | + | .val |--->| map_elt | | | | + +---------+ +-----------+ | | | + . | .key |---> full key * | | | + . +---------+ +---------------+ | | | + . | .fields |--->| .sum (val) | | | | + . +---------+ | 2345 | | | | + . +--| .vars | +---------------+ | | | + . | +---------+ | .offset (key) | | | | + . | | 0 | | | | + . | +---------------+ | | | + . | . | | | + . | . | | | + . | . | | | + . | +---------------+ | | | + . | | .sum (val) or | | | | + . | | .offset (key) | | | | + . | +---------------+ | | | + . | | .sum (val) or | | | | + . | | .offset (key) | | | | + . | +---------------+ | | | + . | | | | + . +---------------->+---------------+ | | | + . | ts0 |<--+ | | + . | 113345679876 | | | | + . +---------------+ | | | + . | unused | | | | + . | | | | | + . +---------------+ | | | + . . | | | + . . | | | + . . | | | + . +---------------+ | | | + . | unused | | | | + . | | | | | + . +---------------+ | | | + . | unused | | | | + . | | | | | + . +---------------+ | | | + . | | | + +-----------+ | | | + | map_entry | | | | + +-----------+ | | | + | .key |---> pid = 4444 | | | + +---------+ +-----------+ | | | + | .val |--->| map_elt | | | | + +---------+ +-----------+ | | | + . | .key |---> full key * | | | + . +---------+ +---------------+ | | | + . | .fields |--->| .sum (val) | | | | + +---------+ | 2345 | | | | + +--| .vars | +---------------+ | | | + | +---------+ | .offset (key) | | | | + | | 0 | | | | + | +---------------+ | | | + | . | | | + | . | | | + | . | | | + | +---------------+ | | | + | | .sum (val) or | | | | + | | .offset (key) | | | | + | +---------------+ | | | + | | .sum (val) or | | | | + | | .offset (key) | | | | + | +---------------+ | | | + | | | | + | +---------------+ | | | + +---------------->| ts0 |<--+ | | + | 213499240729 | | | + +---------------+ | | + | unused | | | + | | | | + +---------------+ | | + . | | + . | | + . | | + +---------------+ | | + | unused | | | + | | | | + +---------------+ | | + | unused | | | + | | | | + +---------------+ | | + +For each used map entry, there's a map_elt pointing to an array of | | +.vars containing the current value of the variables associated with | | +that histogram entry. So in the above, the timestamp associated with | | +pid 999 is 113345679876, and the timestamp variable in the same | | +.var.idx for pid 4444 is 213499240729. | | + +sched_switch histogram | | +---------------------- | | + +The sched_switch histogram paired with the above sched_waking | | +histogram is shown below. The most important aspect of the | | +sched_switch histogram is that it references a variable on the | | +sched_waking histogram above. | | + +The histogram diagram is very similar to the others so far displayed, | | +but it adds variable references. You can see the normal hitcount and | | +key fields along with a new wakeup_lat variable implemented in the | | +same way as the sched_waking ts0 variable, but in addition there's an | | +entry with the new FL_VAR_REF (short for HIST_FIELD_FL_VAR_REF) flag. | | + +Associated with the new var ref field are a couple of new hist_field | | +members, var.hist_data and var_ref_idx. For a variable reference, the | | +var.hist_data goes with the var.idx, which together uniquely identify | | +a particular variable on a particular histogram. The var_ref_idx is | | +just the index into the var_ref_vals[] array that caches the values of | | +each variable whenever a hist trigger is updated. Those resulting | | +values are then finally accessed by other code such as trace action | | +code that uses the var_ref_idx values to assign param values. | | + +The diagram below describes the situation for the sched_switch | | +histogram referred to before:: + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> | | + events/sched/sched_switch/trigger | | + | | + +------------------+ | | + | hist_data | | | + +------------------+ +-----------------------+ | | + | .fields[] |-->| val = hitcount | | | + +----------------+ +-----------------------+ | | + | .map | | .size | | | + +----------------+ +---------------------+ | | + +--| .var_refs[] | | .offset | | | + | +----------------+ +---------------------+ | | + | | .fn() | | | + | var_ref_vals[] +---------------------+ | | + | +-------------+ | .flags | | | + | | $ts0 |<---+ +---------------------+ | | + | +-------------+ | | .var.idx | | | + | | | | +---------------------+ | | + | +-------------+ | | .var.hist_data | | | + | | | | +---------------------+ | | + | +-------------+ | | .var_ref_idx | | | + | | | | +-----------------------+ | | + | +-------------+ | | var = wakeup_lat | | | + | . | +-----------------------+ | | + | . | | .size | | | + | . | +---------------------+ | | + | +-------------+ | | .offset | | | + | | | | +---------------------+ | | + | +-------------+ | | .fn() | | | + | | | | +---------------------+ | | + | +-------------+ | | .flags & FL_VAR | | | + | | +---------------------+ | | + | | | .var.idx | | | + | | +---------------------+ | | + | | | .var.hist_data | | | + | | +---------------------+ | | + | | | .var_ref_idx | | | + | | +---------------------+ | | + | | . | | + | | . | | + | | . | | + | | +-----------------------+ <--- n_vals | | + | | | key = pid | | | + | | +-----------------------+ | | + | | | .size | | | + | | +---------------------+ | | + | | | .offset | | | + | | +---------------------+ | | + | | | .fn() | | | + | | +---------------------+ | | + | | | .flags | | | + | | +---------------------+ | | + | | | .var.idx | | | + | | +-----------------------+ <--- n_fields | | + | | | unused | | | + | | +-----------------------+ | | + | | | | | | + | | +---------------------+ | | + | | | | | | + | | +---------------------+ | | + | | | | | | + | | +---------------------+ | | + | | | | | | + | | +---------------------+ | | + | | | | | | + | | +---------------------+ | | + | | n_keys = n_fields - n_vals | | + | | | | + | | | | + | | +-----------------------+ | | + +---------------------->| var_ref = $ts0 | | | + | +-----------------------+ | | + | | .size | | | + | +---------------------+ | | + | | .offset | | | + | +---------------------+ | | + | | .fn() | | | + | +---------------------+ | | + | | .flags & FL_VAR_REF | | | + | +---------------------+ | | + | | .var.idx |--------------------------+ | + | +---------------------+ | + | | .var.hist_data |----------------------------+ + | +---------------------+ + +---| .var_ref_idx | + +---------------------+ + +Abbreviations used in the diagrams:: + + hist_data = struct hist_trigger_data + hist_data.fields = struct hist_field + fn = hist_field_fn_t + FL_KEY = HIST_FIELD_FL_KEY + FL_VAR = HIST_FIELD_FL_VAR + FL_VAR_REF = HIST_FIELD_FL_VAR_REF + +When a hist trigger makes use of a variable, a new hist_field is +created with flag HIST_FIELD_FL_VAR_REF. For a VAR_REF field, the +var.idx and var.hist_data take the same values as the referenced +variable, as well as the referenced variable's size, type, and +is_signed values. The VAR_REF field's .name is set to the name of the +variable it references. If a variable reference was created using the +explicit system.event.$var_ref notation, the hist_field's system and +event_name variables are also set. + +So, in order to handle an event for the sched_switch histogram, +because we have a reference to a variable on another histogram, we +need to resolve all variable references first. This is done via the +resolve_var_refs() calls made from event_hist_trigger(). What this +does is grabs the var_refs[] array from the hist_data representing the +sched_switch histogram. For each one of those, the referenced +variable's var.hist_data along with the current key is used to look up +the corresponding tracing_map_elt in that histogram. Once found, the +referenced variable's var.idx is used to look up the variable's value +using tracing_map_read_var(elt, var.idx), which yields the value of +the variable for that element, ts0 in the case above. Note that both +the hist_fields representing both the variable and the variable +reference have the same var.idx, so this is straightforward. + +Variable and variable reference test +------------------------------------ + +This example creates a variable on the sched_waking event, ts0, and +uses it in the sched_switch trigger. The sched_switch trigger also +creates its own variable, wakeup_lat, but nothing yet uses it:: + + # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger + +Looking at the sched_waking 'hist_debug' output, in addition to the +normal key and value hist_fields, in the val fields section we see a +field with the HIST_FIELD_FL_VAR flag, which indicates that that field +represents a variable. Note that in addition to the variable name, +contained in the var.name field, it includes the var.idx, which is the +index into the tracing_map_elt.vars[] array of the actual variable +location. Note also that the output shows that variables live in the +same part of the hist_data->fields[] array as normal values:: + + # cat events/sched/sched_waking/hist_debug + + # event histogram + # + # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] + # + + hist_data: 000000009536f554 + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 8 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: pid + type: pid_t + size: 8 + is_signed: 1 + +Moving on to the sched_switch trigger hist_debug output, in addition +to the unused wakeup_lat variable, we see a new section displaying +variable references. Variable references are displayed in a separate +section because in addition to to being logically separate from +variables and values, they actually live in a separate hist_data +array, var_refs[]. + +In this example, the sched_switch trigger has a reference to a +variable on the sched_waking trigger, $ts0. Looking at the details, +we can see that the var.hist_data value of the referenced variable +matches the previously displayed sched_waking trigger, and the var.idx +value matches the previously displayed var.idx value for that +variable. Also displayed is the var_ref_idx value for that variable +reference, which is where the value for that variable is cached for +use when the trigger is invoked:: + + # cat events/sched/sched_switch/hist_debug + + # event histogram + # + # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global [active] + # + + hist_data: 00000000f4ee8006 + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 0 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: next_pid + type: pid_t + size: 8 + is_signed: 1 + + variable reference fields: + + hist_data->var_refs[0]: + flags: + HIST_FIELD_FL_VAR_REF + name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 000000009536f554 + var_ref_idx (into hist_data->var_refs[]): 0 + type: u64 + size: 8 + is_signed: 0 + +The commands below can be used to clean things up for the next test:: + + # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger + + # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + +Actions and Handlers +==================== + +Adding onto the previous example, we will now do something with that +wakeup_lat variable, namely send it and another field as a synthetic +event. + +The onmatch() action below basically says that whenever we have a +sched_switch event, if we have a matching sched_waking event, in this +case if we have a pid in the sched_waking histogram that matches the +the next_pid field on this sched_switch event, we retrieve the +variables specified in the wakeup_latency() trace action, and use +them to generate a new wakeup_latency event into the trace stream. + +Note that the way the trace handlers such as wakeup_latency() (which +could equivalently be written trace(wakeup_latency,$wakeup_lat,next_pid) +are implemented, the parameters specified to the trace handler must be +variables. In this case, $wakeup_lat is obviously a variable, but +next_pid isn't, since it's just naming a field in the sched_switch +trace event. Since this is something that almost every trace() and +save() action does, a special shortcut is implemented to allow field +names to be used directly in those cases. How it works is that under +the covers, a temporary variable is created for the named field, and +this variable is what is actually passed to the trace handler. In the +code and documentation, this type of variable is called a 'field +variable'. + +Fields on other trace event's histograms can be used as well. In that +case we have to generate a new histogram and an unfortunately named +'synthetic_field' (the use of synthetic here has nothing to do with +synthetic events) and use that special histogram field as a variable. + +The diagram below illustrates the new elements described above in the +context of the sched_switch histogram using the onmatch() handler and +the trace() action. + +First, we define the wakeup_latency synthetic event:: + + # echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events + +Next, the sched_waking hist trigger as before:: + + # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> + events/sched/sched_waking/trigger + +Finally, we create a hist trigger on the sched_switch event that +generates a wakeup_latency() trace event. In this case we pass +next_pid into the wakeup_latency synthetic event invocation, which +means it will be automatically converted into a field variable:: + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ + onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid)' >> + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + +The diagram for the sched_switch event is similar to previous examples +but shows the additional field_vars[] array for hist_data and shows +the linkages between the field_vars and the variables and references +created to implement the field variables. The details are discussed +below:: + + +------------------+ + | hist_data | + +------------------+ +-----------------------+ + | .fields[] |-->| val = hitcount | + +----------------+ +-----------------------+ + | .map | | .size | + +----------------+ +---------------------+ + +---| .field_vars[] | | .offset | + | +----------------+ +---------------------+ + |+--| .var_refs[] | | .offset | + || +----------------+ +---------------------+ + || | .fn() | + || var_ref_vals[] +---------------------+ + || +-------------+ | .flags | + || | $ts0 |<---+ +---------------------+ + || +-------------+ | | .var.idx | + || | $next_pid |<-+ | +---------------------+ + || +-------------+ | | | .var.hist_data | + ||+>| $wakeup_lat | | | +---------------------+ + ||| +-------------+ | | | .var_ref_idx | + ||| | | | | +-----------------------+ + ||| +-------------+ | | | var = wakeup_lat | + ||| . | | +-----------------------+ + ||| . | | | .size | + ||| . | | +---------------------+ + ||| +-------------+ | | | .offset | + ||| | | | | +---------------------+ + ||| +-------------+ | | | .fn() | + ||| | | | | +---------------------+ + ||| +-------------+ | | | .flags & FL_VAR | + ||| | | +---------------------+ + ||| | | | .var.idx | + ||| | | +---------------------+ + ||| | | | .var.hist_data | + ||| | | +---------------------+ + ||| | | | .var_ref_idx | + ||| | | +---------------------+ + ||| | | . + ||| | | . + ||| | | . + ||| | | . + ||| +--------------+ | | . + +-->| field_var | | | . + || +--------------+ | | . + || | var | | | . + || +------------+ | | . + || | val | | | . + || +--------------+ | | . + || | field_var | | | . + || +--------------+ | | . + || | var | | | . + || +------------+ | | . + || | val | | | . + || +------------+ | | . + || . | | . + || . | | . + || . | | +-----------------------+ <--- n_vals + || +--------------+ | | | key = pid | + || | field_var | | | +-----------------------+ + || +--------------+ | | | .size | + || | var |--+| +---------------------+ + || +------------+ ||| | .offset | + || | val |-+|| +---------------------+ + || +------------+ ||| | .fn() | + || ||| +---------------------+ + || ||| | .flags | + || ||| +---------------------+ + || ||| | .var.idx | + || ||| +---------------------+ <--- n_fields + || ||| + || ||| n_keys = n_fields - n_vals + || ||| +-----------------------+ + || |+->| var = next_pid | + || | | +-----------------------+ + || | | | .size | + || | | +---------------------+ + || | | | .offset | + || | | +---------------------+ + || | | | .flags & FL_VAR | + || | | +---------------------+ + || | | | .var.idx | + || | | +---------------------+ + || | | | .var.hist_data | + || | | +-----------------------+ + || +-->| val for next_pid | + || | | +-----------------------+ + || | | | .size | + || | | +---------------------+ + || | | | .offset | + || | | +---------------------+ + || | | | .fn() | + || | | +---------------------+ + || | | | .flags | + || | | +---------------------+ + || | | | | + || | | +---------------------+ + || | | + || | | + || | | +-----------------------+ + +|------------------|-|>| var_ref = $ts0 | + | | | +-----------------------+ + | | | | .size | + | | | +---------------------+ + | | | | .offset | + | | | +---------------------+ + | | | | .fn() | + | | | +---------------------+ + | | | | .flags & FL_VAR_REF | + | | | +---------------------+ + | | +---| .var_ref_idx | + | | +-----------------------+ + | | | var_ref = $next_pid | + | | +-----------------------+ + | | | .size | + | | +---------------------+ + | | | .offset | + | | +---------------------+ + | | | .fn() | + | | +---------------------+ + | | | .flags & FL_VAR_REF | + | | +---------------------+ + | +-----| .var_ref_idx | + | +-----------------------+ + | | var_ref = $wakeup_lat | + | +-----------------------+ + | | .size | + | +---------------------+ + | | .offset | + | +---------------------+ + | | .fn() | + | +---------------------+ + | | .flags & FL_VAR_REF | + | +---------------------+ + +------------------------| .var_ref_idx | + +---------------------+ + +As you can see, for a field variable, two hist_fields are created: one +representing the variable, in this case next_pid, and one to actually +get the value of the field from the trace stream, like a normal val +field does. These are created separately from normal variable +creation and are saved in the hist_data->field_vars[] array. See +below for how these are used. In addition, a reference hist_field is +also created, which is needed to reference the field variables such as +$next_pid variable in the trace() action. + +Note that $wakeup_lat is also a variable reference, referencing the +value of the expression common_timestamp-$ts0, and so also needs to +have a hist field entry representing that reference created. + +When hist_trigger_elt_update() is called to get the normal key and +value fields, it also calls update_field_vars(), which goes through +each field_var created for the histogram, and available from +hist_data->field_vars and calls val->fn() to get the data from the +current trace record, and then uses the var's var.idx to set the +variable at the var.idx offset in the appropriate tracing_map_elt's +variable at elt->vars[var.idx]. + +Once all the variables have been updated, resolve_var_refs() can be +called from event_hist_trigger(), and not only can our $ts0 and +$next_pid references be resolved but the $wakeup_lat reference as +well. At this point, the trace() action can simply access the values +assembled in the var_ref_vals[] array and generate the trace event. + +The same process occurs for the field variables associated with the +save() action. + +Abbreviations used in the diagram:: + + hist_data = struct hist_trigger_data + hist_data.fields = struct hist_field + field_var = struct field_var + fn = hist_field_fn_t + FL_KEY = HIST_FIELD_FL_KEY + FL_VAR = HIST_FIELD_FL_VAR + FL_VAR_REF = HIST_FIELD_FL_VAR_REF + +trace() action field variable test +---------------------------------- + +This example adds to the previous test example by finally making use +of the wakeup_lat variable, but in addition also creates a couple of +field variables that then are all passed to the wakeup_latency() trace +action via the onmatch() handler. + +First, we create the wakeup_latency synthetic event:: + + # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events + +Next, the sched_waking trigger from previous examples:: + + # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + +Finally, as in the previous test example, we calculate and assign the +wakeup latency using the $ts0 reference from the sched_waking trigger +to the wakeup_lat variable, and finally use it along with a couple +sched_switch event fields, next_pid and next_comm, to generate a +wakeup_latency trace event. The next_pid and next_comm event fields +are automatically converted into field variables for this purpose:: + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + +The sched_waking hist_debug output shows the same data as in the +previous test example:: + + # cat events/sched/sched_waking/hist_debug + + # event histogram + # + # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] + # + + hist_data: 00000000d60ff61f + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 8 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: pid + type: pid_t + size: 8 + is_signed: 1 + +The sched_switch hist_debug output shows the same key and value fields +as in the previous test example - note that wakeup_lat is still in the +val fields section, but that the new field variables are not there - +although the field variables are variables, they're held separately in +the hist_data's field_vars[] array. Although the field variables and +the normal variables are located in separate places, you can see that +the actual variable locations for those variables in the +tracing_map_elt.vars[] do have increasing indices as expected: +wakeup_lat takes the var.idx = 0 slot, while the field variables for +next_pid and next_comm have values var.idx = 1, and var.idx = 2. Note +also that those are the same values displayed for the variable +references corresponding to those variables in the variable reference +fields section. Since there are two triggers and thus two hist_data +addresses, those addresses also need to be accounted for when doing +the matching - you can see that the first variable refers to the 0 +var.idx on the previous hist trigger (see the hist_data address +associated with that trigger), while the second variable refers to the +0 var.idx on the sched_switch hist trigger, as do all the remaining +variable references. + +Finally, the action tracking variables section just shows the system +and event name for the onmatch() handler:: + + # cat events/sched/sched_switch/hist_debug + + # event histogram + # + # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm) [active] + # + + hist_data: 0000000008f551b7 + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 0 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: next_pid + type: pid_t + size: 8 + is_signed: 1 + + variable reference fields: + + hist_data->var_refs[0]: + flags: + HIST_FIELD_FL_VAR_REF + name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 00000000d60ff61f + var_ref_idx (into hist_data->var_refs[]): 0 + type: u64 + size: 8 + is_signed: 0 + + hist_data->var_refs[1]: + flags: + HIST_FIELD_FL_VAR_REF + name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 0000000008f551b7 + var_ref_idx (into hist_data->var_refs[]): 1 + type: u64 + size: 0 + is_signed: 0 + + hist_data->var_refs[2]: + flags: + HIST_FIELD_FL_VAR_REF + name: next_pid + var.idx (into tracing_map_elt.vars[]): 1 + var.hist_data: 0000000008f551b7 + var_ref_idx (into hist_data->var_refs[]): 2 + type: pid_t + size: 4 + is_signed: 0 + + hist_data->var_refs[3]: + flags: + HIST_FIELD_FL_VAR_REF + name: next_comm + var.idx (into tracing_map_elt.vars[]): 2 + var.hist_data: 0000000008f551b7 + var_ref_idx (into hist_data->var_refs[]): 3 + type: char[16] + size: 256 + is_signed: 0 + + field variables: + + hist_data->field_vars[0]: + + field_vars[0].var: + flags: + HIST_FIELD_FL_VAR + var.name: next_pid + var.idx (into tracing_map_elt.vars[]): 1 + + field_vars[0].val: + ftrace_event_field name: next_pid + type: pid_t + size: 4 + is_signed: 1 + + hist_data->field_vars[1]: + + field_vars[1].var: + flags: + HIST_FIELD_FL_VAR + var.name: next_comm + var.idx (into tracing_map_elt.vars[]): 2 + + field_vars[1].val: + ftrace_event_field name: next_comm + type: char[16] + size: 256 + is_signed: 0 + + action tracking variables (for onmax()/onchange()/onmatch()): + + hist_data->actions[0].match_data.event_system: sched + hist_data->actions[0].match_data.event: sched_waking + +The commands below can be used to clean things up for the next test:: + + # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + + # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + + # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events + +action_data and the trace() action +---------------------------------- + +As mentioned above, when the trace() action generates a synthetic +event, all the parameters to the synthetic event either already are +variables or are converted into variables (via field variables), and +finally all those variable values are collected via references to them +into a var_ref_vals[] array. + +The values in the var_ref_vals[] array, however, don't necessarily +follow the same ordering as the synthetic event params. To address +that, struct action_data contains another array, var_ref_idx[] that +maps the trace action params to the var_ref_vals[] values. Below is a +diagram illustrating that for the wakeup_latency() synthetic event:: + + +------------------+ wakeup_latency() + | action_data | event params var_ref_vals[] + +------------------+ +-----------------+ +-----------------+ + | .var_ref_idx[] |--->| $wakeup_lat idx |---+ | | + +----------------+ +-----------------+ | +-----------------+ + | .synth_event | | $next_pid idx |---|-+ | $wakeup_lat val | + +----------------+ +-----------------+ | | +-----------------+ + . | +->| $next_pid val | + . | +-----------------+ + . | . + +-----------------+ | . + | | | . + +-----------------+ | +-----------------+ + +--->| $wakeup_lat val | + +-----------------+ + +Basically, how this ends up getting used in the synthetic event probe +function, trace_event_raw_event_synth(), is as follows:: + + for each field i in .synth_event + val_idx = .var_ref_idx[i] + val = var_ref_vals[val_idx] + +action_data and the onXXX() handlers +------------------------------------ + +The hist trigger onXXX() actions other than onmatch(), such as onmax() +and onchange(), also make use of and internally create hidden +variables. This information is contained in the +action_data.track_data struct, and is also visible in the hist_debug +output as will be described in the example below. + +Typically, the onmax() or onchange() handlers are used in conjunction +with the save() and snapshot() actions. For example:: + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ + onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + +or:: + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ + onmax($wakeup_lat).snapshot()' >> + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger + +save() action field variable test +--------------------------------- + +For this example, instead of generating a synthetic event, the save() +action is used to save field values whenever an onmax() handler +detects that a new max latency has been hit. As in the previous +example, the values being saved are also field values, but in this +case, are kept in a separate hist_data array named save_vars[]. + +As in previous test examples, we set up the sched_waking trigger:: + + # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + +In this case, however, we set up the sched_switch trigger to save some +sched_switch field values whenever we hit a new maximum latency. For +both the onmax() handler and save() action, variables will be created, +which we can use the hist_debug files to examine:: + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger + +The sched_waking hist_debug output shows the same data as in the +previous test examples:: + + # cat events/sched/sched_waking/hist_debug + + # + # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] + # + + hist_data: 00000000e6290f48 + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 8 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: pid + type: pid_t + size: 8 + is_signed: 1 + +The output of the sched_switch trigger shows the same val and key +values as before, but also shows a couple new sections. + +First, the action tracking variables section now shows the +actions[].track_data information describing the special tracking +variables and references used to track, in this case, the running +maximum value. The actions[].track_data.var_ref member contains the +reference to the variable being tracked, in this case the $wakeup_lat +variable. In order to perform the onmax() handler function, there +also needs to be a variable that tracks the current maximum by getting +updated whenever a new maximum is hit. In this case, we can see that +an auto-generated variable named ' __max' has been created and is +visible in the actions[].track_data.track_var variable. + +Finally, in the new 'save action variables' section, we can see that +the 4 params to the save() function have resulted in 4 field variables +being created for the purposes of saving the values of the named +fields when the max is hit. These variables are kept in a separate +save_vars[] array off of hist_data, so are displayed in a separate +section:: + + # cat events/sched/sched_switch/hist_debug + + # event histogram + # + # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) [active] + # + + hist_data: 0000000057bcd28d + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 0 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: next_pid + type: pid_t + size: 8 + is_signed: 1 + + variable reference fields: + + hist_data->var_refs[0]: + flags: + HIST_FIELD_FL_VAR_REF + name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 00000000e6290f48 + var_ref_idx (into hist_data->var_refs[]): 0 + type: u64 + size: 8 + is_signed: 0 + + hist_data->var_refs[1]: + flags: + HIST_FIELD_FL_VAR_REF + name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 0000000057bcd28d + var_ref_idx (into hist_data->var_refs[]): 1 + type: u64 + size: 0 + is_signed: 0 + + action tracking variables (for onmax()/onchange()/onmatch()): + + hist_data->actions[0].track_data.var_ref: + flags: + HIST_FIELD_FL_VAR_REF + name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 0000000057bcd28d + var_ref_idx (into hist_data->var_refs[]): 1 + type: u64 + size: 0 + is_signed: 0 + + hist_data->actions[0].track_data.track_var: + flags: + HIST_FIELD_FL_VAR + var.name: __max + var.idx (into tracing_map_elt.vars[]): 1 + type: u64 + size: 8 + is_signed: 0 + + save action variables (save() params): + + hist_data->save_vars[0]: + + save_vars[0].var: + flags: + HIST_FIELD_FL_VAR + var.name: next_comm + var.idx (into tracing_map_elt.vars[]): 2 + + save_vars[0].val: + ftrace_event_field name: next_comm + type: char[16] + size: 256 + is_signed: 0 + + hist_data->save_vars[1]: + + save_vars[1].var: + flags: + HIST_FIELD_FL_VAR + var.name: prev_pid + var.idx (into tracing_map_elt.vars[]): 3 + + save_vars[1].val: + ftrace_event_field name: prev_pid + type: pid_t + size: 4 + is_signed: 1 + + hist_data->save_vars[2]: + + save_vars[2].var: + flags: + HIST_FIELD_FL_VAR + var.name: prev_prio + var.idx (into tracing_map_elt.vars[]): 4 + + save_vars[2].val: + ftrace_event_field name: prev_prio + type: int + size: 4 + is_signed: 1 + + hist_data->save_vars[3]: + + save_vars[3].var: + flags: + HIST_FIELD_FL_VAR + var.name: prev_comm + var.idx (into tracing_map_elt.vars[]): 5 + + save_vars[3].val: + ftrace_event_field name: prev_comm + type: char[16] + size: 256 + is_signed: 0 + +The commands below can be used to clean things up for the next test:: + + # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger + + # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + +A couple special cases +====================== + +While the above covers the basics of the histogram internals, there +are a couple of special cases that should be discussed, since they +tend to create even more confusion. Those are field variables on other +histograms, and aliases, both described below through example tests +using the hist_debug files. + +Test of field variables on other histograms +------------------------------------------- + +This example is similar to the previous examples, but in this case, +the sched_switch trigger references a hist trigger field on another +event, namely the sched_waking event. In order to accomplish this, a +field variable is created for the other event, but since an existing +histogram can't be used, as existing histograms are immutable, a new +histogram with a matching variable is created and used, and we'll see +that reflected in the hist_debug output shown below. + +First, we create the wakeup_latency synthetic event. Note the +addition of the prio field:: + + # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events + +As in previous test examples, we set up the sched_waking trigger:: + + # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + +Here we set up a hist trigger on sched_switch to send a wakeup_latency +event using an onmatch handler naming the sched_waking event. Note +that the third param being passed to the wakeup_latency() is prio, +which is a field name that needs to have a field variable created for +it. There isn't however any prio field on the sched_switch event so +it would seem that it wouldn't be possible to create a field variable +for it. The matching sched_waking event does have a prio field, so it +should be possible to make use of it for this purpose. The problem +with that is that it's not currently possible to define a new variable +on an existing histogram, so it's not possible to add a new prio field +variable to the existing sched_waking histogram. It is however +possible to create an additional new 'matching' sched_waking histogram +for the same event, meaning that it uses the same key and filters, and +define the new prio field variable on that. + +Here's the sched_switch trigger:: + + # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger + +And here's the output of the hist_debug information for the +sched_waking hist trigger. Note that there are two histograms +displayed in the output: the first is the normal sched_waking +histogram we've seen in the previous examples, and the second is the +special histogram we created to provide the prio field variable. + +Looking at the second histogram below, we see a variable with the name +synthetic_prio. This is the field variable created for the prio field +on that sched_waking histogram:: + + # cat events/sched/sched_waking/hist_debug + + # event histogram + # + # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] + # + + hist_data: 00000000349570e4 + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 8 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: pid + type: pid_t + size: 8 + is_signed: 1 + + + # event histogram + # + # trigger info: hist:keys=pid:vals=hitcount:synthetic_prio=prio:sort=hitcount:size=2048 [active] + # + + hist_data: 000000006920cf38 + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + ftrace_event_field name: prio + var.name: synthetic_prio + var.idx (into tracing_map_elt.vars[]): 0 + type: int + size: 4 + is_signed: 1 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: pid + type: pid_t + size: 8 + is_signed: 1 + +Looking at the sched_switch histogram below, we can see a reference to +the synthetic_prio variable on sched_waking, and looking at the +associated hist_data address we see that it is indeed associated with +the new histogram. Note also that the other references are to a +normal variable, wakeup_lat, and to a normal field variable, next_pid, +the details of which are in the field variables section:: + + # cat events/sched/sched_switch/hist_debug + + # event histogram + # + # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) [active] + # + + hist_data: 00000000a73b67df + + n_vals: 2 + n_keys: 1 + n_fields: 3 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + var.name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + type: u64 + size: 0 + is_signed: 0 + + key fields: + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: next_pid + type: pid_t + size: 8 + is_signed: 1 + + variable reference fields: + + hist_data->var_refs[0]: + flags: + HIST_FIELD_FL_VAR_REF + name: ts0 + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 00000000349570e4 + var_ref_idx (into hist_data->var_refs[]): 0 + type: u64 + size: 8 + is_signed: 0 + + hist_data->var_refs[1]: + flags: + HIST_FIELD_FL_VAR_REF + name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 00000000a73b67df + var_ref_idx (into hist_data->var_refs[]): 1 + type: u64 + size: 0 + is_signed: 0 + + hist_data->var_refs[2]: + flags: + HIST_FIELD_FL_VAR_REF + name: next_pid + var.idx (into tracing_map_elt.vars[]): 1 + var.hist_data: 00000000a73b67df + var_ref_idx (into hist_data->var_refs[]): 2 + type: pid_t + size: 4 + is_signed: 0 + + hist_data->var_refs[3]: + flags: + HIST_FIELD_FL_VAR_REF + name: synthetic_prio + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 000000006920cf38 + var_ref_idx (into hist_data->var_refs[]): 3 + type: int + size: 4 + is_signed: 1 + + field variables: + + hist_data->field_vars[0]: + + field_vars[0].var: + flags: + HIST_FIELD_FL_VAR + var.name: next_pid + var.idx (into tracing_map_elt.vars[]): 1 + + field_vars[0].val: + ftrace_event_field name: next_pid + type: pid_t + size: 4 + is_signed: 1 + + action tracking variables (for onmax()/onchange()/onmatch()): + + hist_data->actions[0].match_data.event_system: sched + hist_data->actions[0].match_data.event: sched_waking + +The commands below can be used to clean things up for the next test:: + + # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger + + # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + + # echo '!wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events + +Alias test +---------- + +This example is very similar to previous examples, but demonstrates +the alias flag. + +First, we create the wakeup_latency synthetic event:: + + # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events + +Next, we create a sched_waking trigger similar to previous examples, +but in this case we save the pid in the waking_pid variable:: + + # echo 'hist:keys=pid:waking_pid=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + +For the sched_switch trigger, instead of using $waking_pid directly in +the wakeup_latency synthetic event invocation, we create an alias of +$waking_pid named $woken_pid, and use that in the synthetic event +invocation instead:: + + # echo 'hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger + +Looking at the sched_waking hist_debug output, in addition to the +normal fields, we can see the waking_pid variable:: + + # cat events/sched/sched_waking/hist_debug + + # event histogram + # + # trigger info: hist:keys=pid:vals=hitcount:waking_pid=pid,ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] + # + + hist_data: 00000000a250528c + + n_vals: 3 + n_keys: 1 + n_fields: 4 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + ftrace_event_field name: pid + var.name: waking_pid + var.idx (into tracing_map_elt.vars[]): 0 + type: pid_t + size: 4 + is_signed: 1 + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_VAR + var.name: ts0 + var.idx (into tracing_map_elt.vars[]): 1 + type: u64 + size: 8 + is_signed: 0 + + key fields: + + hist_data->fields[3]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: pid + type: pid_t + size: 8 + is_signed: 1 + +The sched_switch hist_debug output shows that a variable named +woken_pid has been created but that it also has the +HIST_FIELD_FL_ALIAS flag set. It also has the HIST_FIELD_FL_VAR flag +set, which is why it appears in the val field section. + +Despite that implementation detail, an alias variable is actually more +like a variable reference; in fact it can be thought of as a reference +to a reference. The implementation copies the var_ref->fn() from the +variable reference being referenced, in this case, the waking_pid +fn(), which is hist_field_var_ref() and makes that the fn() of the +alias. The hist_field_var_ref() fn() requires the var_ref_idx of the +variable reference it's using, so waking_pid's var_ref_idx is also +copied to the alias. The end result is that when the value of alias +is retrieved, in the end it just does the same thing the original +reference would have done and retrieves the same value from the +var_ref_vals[] array. You can verify this in the output by noting +that the var_ref_idx of the alias, in this case woken_pid, is the same +as the var_ref_idx of the reference, waking_pid, in the variable +reference fields section. + +Additionally, once it gets that value, since it is also a variable, it +then saves that value into its var.idx. So the var.idx of the +woken_pid alias is 0, which it fills with the value from var_ref_idx 0 +when its fn() is called to update itself. You'll also notice that +there's a woken_pid var_ref in the variable refs section. That is the +reference to the woken_pid alias variable, and you can see that it +retrieves the value from the same var.idx as the woken_pid alias, 0, +and then in turn saves that value in its own var_ref_idx slot, 3, and +the value at this position is finally what gets assigned to the +$woken_pid slot in the trace event invocation:: + + # cat events/sched/sched_switch/hist_debug + + # event histogram + # + # trigger info: hist:keys=next_pid:vals=hitcount:woken_pid=$waking_pid,wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm) [active] + # + + hist_data: 0000000055d65ed0 + + n_vals: 3 + n_keys: 1 + n_fields: 4 + + val fields: + + hist_data->fields[0]: + flags: + VAL: HIST_FIELD_FL_HITCOUNT + type: u64 + size: 8 + is_signed: 0 + + hist_data->fields[1]: + flags: + HIST_FIELD_FL_VAR + HIST_FIELD_FL_ALIAS + var.name: woken_pid + var.idx (into tracing_map_elt.vars[]): 0 + var_ref_idx (into hist_data->var_refs[]): 0 + type: pid_t + size: 4 + is_signed: 1 + + hist_data->fields[2]: + flags: + HIST_FIELD_FL_VAR + var.name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 1 + type: u64 + size: 0 + is_signed: 0 + + key fields: + + hist_data->fields[3]: + flags: + HIST_FIELD_FL_KEY + ftrace_event_field name: next_pid + type: pid_t + size: 8 + is_signed: 1 + + variable reference fields: + + hist_data->var_refs[0]: + flags: + HIST_FIELD_FL_VAR_REF + name: waking_pid + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 00000000a250528c + var_ref_idx (into hist_data->var_refs[]): 0 + type: pid_t + size: 4 + is_signed: 1 + + hist_data->var_refs[1]: + flags: + HIST_FIELD_FL_VAR_REF + name: ts0 + var.idx (into tracing_map_elt.vars[]): 1 + var.hist_data: 00000000a250528c + var_ref_idx (into hist_data->var_refs[]): 1 + type: u64 + size: 8 + is_signed: 0 + + hist_data->var_refs[2]: + flags: + HIST_FIELD_FL_VAR_REF + name: wakeup_lat + var.idx (into tracing_map_elt.vars[]): 1 + var.hist_data: 0000000055d65ed0 + var_ref_idx (into hist_data->var_refs[]): 2 + type: u64 + size: 0 + is_signed: 0 + + hist_data->var_refs[3]: + flags: + HIST_FIELD_FL_VAR_REF + name: woken_pid + var.idx (into tracing_map_elt.vars[]): 0 + var.hist_data: 0000000055d65ed0 + var_ref_idx (into hist_data->var_refs[]): 3 + type: pid_t + size: 4 + is_signed: 1 + + hist_data->var_refs[4]: + flags: + HIST_FIELD_FL_VAR_REF + name: next_comm + var.idx (into tracing_map_elt.vars[]): 2 + var.hist_data: 0000000055d65ed0 + var_ref_idx (into hist_data->var_refs[]): 4 + type: char[16] + size: 256 + is_signed: 0 + + field variables: + + hist_data->field_vars[0]: + + field_vars[0].var: + flags: + HIST_FIELD_FL_VAR + var.name: next_comm + var.idx (into tracing_map_elt.vars[]): 2 + + field_vars[0].val: + ftrace_event_field name: next_comm + type: char[16] + size: 256 + is_signed: 0 + + action tracking variables (for onmax()/onchange()/onmatch()): + + hist_data->actions[0].match_data.event_system: sched + hist_data->actions[0].match_data.event: sched_waking + +The commands below can be used to clean things up for the next test:: + + # echo '!hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger + + # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger + + # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 24876faac753..a4020c0b4508 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -249,15 +249,6 @@ config TRACE_PREEMPT_TOGGLE Enables hooks which will be called when preemption is first disabled, and last enabled. -config PREEMPTIRQ_EVENTS - bool "Enable trace events for preempt and irq disable/enable" - select TRACE_IRQFLAGS - select TRACE_PREEMPT_TOGGLE if PREEMPTION - select GENERIC_TRACER - default n - help - Enable tracing of disable and enable events for preemption and irqs. - config IRQSOFF_TRACER bool "Interrupts-off Latency Tracer" default n @@ -614,12 +605,30 @@ config TRACING_MAP generally used outside of that context, and is normally selected by tracers that use it. +config SYNTH_EVENTS + bool "Synthetic trace events" + select TRACING + select DYNAMIC_EVENTS + default n + help + Synthetic events are user-defined trace events that can be + used to combine data from other trace events or in fact any + data source. Synthetic events can be generated indirectly + via the trace() action of histogram triggers or directly + by way of an in-kernel API. + + See Documentation/trace/events.rst or + Documentation/trace/histogram.rst for details and examples. + + If in doubt, say N. + config HIST_TRIGGERS bool "Histogram triggers" depends on ARCH_HAVE_NMI_SAFE_CMPXCHG select TRACING_MAP select TRACING select DYNAMIC_EVENTS + select SYNTH_EVENTS default n help Hist triggers allow one or more arbitrary trace event fields @@ -815,7 +824,7 @@ config PREEMPTIRQ_DELAY_TEST config SYNTH_EVENT_GEN_TEST tristate "Test module for in-kernel synthetic event generation" - depends on HIST_TRIGGERS + depends on SYNTH_EVENTS help This option creates a test module to check the base functionality of in-kernel synthetic event definition and @@ -838,6 +847,29 @@ config KPROBE_EVENT_GEN_TEST If unsure, say N. +config HIST_TRIGGERS_DEBUG + bool "Hist trigger debug support" + depends on HIST_TRIGGERS + help + Add "hist_debug" file for each event, which when read will + dump out a bunch of internal details about the hist triggers + defined on that event. + + The hist_debug file serves a couple of purposes: + + - Helps developers verify that nothing is broken. + + - Provides educational information to support the details + of the hist trigger internals as described by + Documentation/trace/histogram-design.rst. + + The hist_debug output only covers the data structures + related to the histogram definitions themselves and doesn't + display the internals of map buckets or variable values of + running histograms. + + If unsure, say N. + endif # FTRACE endif # TRACING_SUPPORT diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile index f9dcd19165fa..1d8aaa546412 100644 --- a/kernel/trace/Makefile +++ b/kernel/trace/Makefile @@ -72,6 +72,7 @@ endif obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o obj-$(CONFIG_TRACE_EVENT_INJECT) += trace_events_inject.o +obj-$(CONFIG_SYNTH_EVENTS) += trace_events_synth.o obj-$(CONFIG_HIST_TRIGGERS) += trace_events_hist.o obj-$(CONFIG_BPF_EVENTS) += bpf_trace.o obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe.o diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 7d0ebd104706..c163c3531faf 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2016,14 +2016,14 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec) { unsigned long ip = rec ? rec->ip : 0; + pr_info("------------[ ftrace bug ]------------\n"); + switch (failed) { case -EFAULT: - FTRACE_WARN_ON_ONCE(1); pr_info("ftrace faulted on modifying "); print_ip_sym(KERN_INFO, ip); break; case -EINVAL: - FTRACE_WARN_ON_ONCE(1); pr_info("ftrace failed to modify "); print_ip_sym(KERN_INFO, ip); print_ip_ins(" actual: ", (unsigned char *)ip); @@ -2034,12 +2034,10 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec) } break; case -EPERM: - FTRACE_WARN_ON_ONCE(1); pr_info("ftrace faulted on writing "); print_ip_sym(KERN_INFO, ip); break; default: - FTRACE_WARN_ON_ONCE(1); pr_info("ftrace faulted on unknown error "); print_ip_sym(KERN_INFO, ip); } @@ -2066,6 +2064,8 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec) ip = ftrace_get_addr_curr(rec); pr_cont("\n expected tramp: %lx\n", ip); } + + FTRACE_WARN_ON_ONCE(1); } static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 3ab27022c20f..ec44b0e2a19c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1299,8 +1299,11 @@ EXPORT_SYMBOL_GPL(tracing_off); void disable_trace_on_warning(void) { - if (__disable_trace_on_warning) + if (__disable_trace_on_warning) { + trace_array_printk_buf(global_trace.array_buffer.buffer, _THIS_IP_, + "Disabling tracing due to warning\n"); tracing_off(); + } } /** diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4eb1d004d5f2..def769df5bf1 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1661,6 +1661,7 @@ extern struct list_head ftrace_events; extern const struct file_operations event_trigger_fops; extern const struct file_operations event_hist_fops; +extern const struct file_operations event_hist_debug_fops; extern const struct file_operations event_inject_fops; #ifdef CONFIG_HIST_TRIGGERS diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 242f59e7f17d..f6f55682d3e2 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2209,6 +2209,10 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) trace_create_file("hist", 0444, file->dir, file, &event_hist_fops); #endif +#ifdef CONFIG_HIST_TRIGGERS_DEBUG + trace_create_file("hist_debug", 0444, file->dir, file, + &event_hist_debug_fops); +#endif trace_create_file("format", 0444, file->dir, call, &ftrace_event_format_fops); diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index fcab11cc6833..0b933546142e 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -19,13 +19,7 @@ #include <trace/events/mmflags.h> #include "tracing_map.h" -#include "trace.h" -#include "trace_dynevent.h" - -#define SYNTH_SYSTEM "synthetic" -#define SYNTH_FIELDS_MAX 32 - -#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */ +#include "trace_synth.h" #define ERRORS \ C(NONE, "No error"), \ @@ -380,69 +374,6 @@ struct hist_trigger_data { unsigned int n_save_var_str; }; -static int create_synth_event(int argc, const char **argv); -static int synth_event_show(struct seq_file *m, struct dyn_event *ev); -static int synth_event_release(struct dyn_event *ev); -static bool synth_event_is_busy(struct dyn_event *ev); -static bool synth_event_match(const char *system, const char *event, - int argc, const char **argv, struct dyn_event *ev); - -static struct dyn_event_operations synth_event_ops = { - .create = create_synth_event, - .show = synth_event_show, - .is_busy = synth_event_is_busy, - .free = synth_event_release, - .match = synth_event_match, -}; - -struct synth_field { - char *type; - char *name; - size_t size; - unsigned int offset; - bool is_signed; - bool is_string; -}; - -struct synth_event { - struct dyn_event devent; - int ref; - char *name; - struct synth_field **fields; - unsigned int n_fields; - unsigned int n_u64; - struct trace_event_class class; - struct trace_event_call call; - struct tracepoint *tp; - struct module *mod; -}; - -static bool is_synth_event(struct dyn_event *ev) -{ - return ev->ops == &synth_event_ops; -} - -static struct synth_event *to_synth_event(struct dyn_event *ev) -{ - return container_of(ev, struct synth_event, devent); -} - -static bool synth_event_is_busy(struct dyn_event *ev) -{ - struct synth_event *event = to_synth_event(ev); - - return event->ref != 0; -} - -static bool synth_event_match(const char *system, const char *event, - int argc, const char **argv, struct dyn_event *ev) -{ - struct synth_event *sev = to_synth_event(ev); - - return strcmp(sev->name, event) == 0 && - (!system || strcmp(system, SYNTH_SYSTEM) == 0); -} - struct action_data; typedef void (*action_fn_t) (struct hist_trigger_data *hist_data, @@ -589,6 +520,7 @@ static struct track_data *track_data_alloc(unsigned int key_len, track_data_free(data); return ERR_PTR(-ENOMEM); } + data->elt.private_data = elt_data; elt_data->comm = kzalloc(TASK_COMM_LEN, GFP_KERNEL); @@ -621,7 +553,6 @@ static void last_cmd_set(struct trace_event_file *file, char *str) if (file) { call = file->event_call; - system = call->class->system; if (system) { name = trace_event_name(call); @@ -646,510 +577,6 @@ static void hist_err_clear(void) last_cmd_loc[0] = '\0'; } -struct synth_trace_event { - struct trace_entry ent; - u64 fields[]; -}; - -static int synth_event_define_fields(struct trace_event_call *call) -{ - struct synth_trace_event trace; - int offset = offsetof(typeof(trace), fields); - struct synth_event *event = call->data; - unsigned int i, size, n_u64; - char *name, *type; - bool is_signed; - int ret = 0; - - for (i = 0, n_u64 = 0; i < event->n_fields; i++) { - size = event->fields[i]->size; - is_signed = event->fields[i]->is_signed; - type = event->fields[i]->type; - name = event->fields[i]->name; - ret = trace_define_field(call, type, name, offset, size, - is_signed, FILTER_OTHER); - if (ret) - break; - - event->fields[i]->offset = n_u64; - - if (event->fields[i]->is_string) { - offset += STR_VAR_LEN_MAX; - n_u64 += STR_VAR_LEN_MAX / sizeof(u64); - } else { - offset += sizeof(u64); - n_u64++; - } - } - - event->n_u64 = n_u64; - - return ret; -} - -static bool synth_field_signed(char *type) -{ - if (str_has_prefix(type, "u")) - return false; - if (strcmp(type, "gfp_t") == 0) - return false; - - return true; -} - -static int synth_field_is_string(char *type) -{ - if (strstr(type, "char[") != NULL) - return true; - - return false; -} - -static int synth_field_string_size(char *type) -{ - char buf[4], *end, *start; - unsigned int len; - int size, err; - - start = strstr(type, "char["); - if (start == NULL) - return -EINVAL; - start += sizeof("char[") - 1; - - end = strchr(type, ']'); - if (!end || end < start) - return -EINVAL; - - len = end - start; - if (len > 3) - return -EINVAL; - - strncpy(buf, start, len); - buf[len] = '\0'; - - err = kstrtouint(buf, 0, &size); - if (err) - return err; - - if (size > STR_VAR_LEN_MAX) - return -EINVAL; - - return size; -} - -static int synth_field_size(char *type) -{ - int size = 0; - - if (strcmp(type, "s64") == 0) - size = sizeof(s64); - else if (strcmp(type, "u64") == 0) - size = sizeof(u64); - else if (strcmp(type, "s32") == 0) - size = sizeof(s32); - else if (strcmp(type, "u32") == 0) - size = sizeof(u32); - else if (strcmp(type, "s16") == 0) - size = sizeof(s16); - else if (strcmp(type, "u16") == 0) - size = sizeof(u16); - else if (strcmp(type, "s8") == 0) - size = sizeof(s8); - else if (strcmp(type, "u8") == 0) - size = sizeof(u8); - else if (strcmp(type, "char") == 0) - size = sizeof(char); - else if (strcmp(type, "unsigned char") == 0) - size = sizeof(unsigned char); - else if (strcmp(type, "int") == 0) - size = sizeof(int); - else if (strcmp(type, "unsigned int") == 0) - size = sizeof(unsigned int); - else if (strcmp(type, "long") == 0) - size = sizeof(long); - else if (strcmp(type, "unsigned long") == 0) - size = sizeof(unsigned long); - else if (strcmp(type, "pid_t") == 0) - size = sizeof(pid_t); - else if (strcmp(type, "gfp_t") == 0) - size = sizeof(gfp_t); - else if (synth_field_is_string(type)) - size = synth_field_string_size(type); - - return size; -} - -static const char *synth_field_fmt(char *type) -{ - const char *fmt = "%llu"; - - if (strcmp(type, "s64") == 0) - fmt = "%lld"; - else if (strcmp(type, "u64") == 0) - fmt = "%llu"; - else if (strcmp(type, "s32") == 0) - fmt = "%d"; - else if (strcmp(type, "u32") == 0) - fmt = "%u"; - else if (strcmp(type, "s16") == 0) - fmt = "%d"; - else if (strcmp(type, "u16") == 0) - fmt = "%u"; - else if (strcmp(type, "s8") == 0) - fmt = "%d"; - else if (strcmp(type, "u8") == 0) - fmt = "%u"; - else if (strcmp(type, "char") == 0) - fmt = "%d"; - else if (strcmp(type, "unsigned char") == 0) - fmt = "%u"; - else if (strcmp(type, "int") == 0) - fmt = "%d"; - else if (strcmp(type, "unsigned int") == 0) - fmt = "%u"; - else if (strcmp(type, "long") == 0) - fmt = "%ld"; - else if (strcmp(type, "unsigned long") == 0) - fmt = "%lu"; - else if (strcmp(type, "pid_t") == 0) - fmt = "%d"; - else if (strcmp(type, "gfp_t") == 0) - fmt = "%x"; - else if (synth_field_is_string(type)) - fmt = "%s"; - - return fmt; -} - -static void print_synth_event_num_val(struct trace_seq *s, - char *print_fmt, char *name, - int size, u64 val, char *space) -{ - switch (size) { - case 1: - trace_seq_printf(s, print_fmt, name, (u8)val, space); - break; - - case 2: - trace_seq_printf(s, print_fmt, name, (u16)val, space); - break; - - case 4: - trace_seq_printf(s, print_fmt, name, (u32)val, space); - break; - - default: - trace_seq_printf(s, print_fmt, name, val, space); - break; - } -} - -static enum print_line_t print_synth_event(struct trace_iterator *iter, - int flags, - struct trace_event *event) -{ - struct trace_array *tr = iter->tr; - struct trace_seq *s = &iter->seq; - struct synth_trace_event *entry; - struct synth_event *se; - unsigned int i, n_u64; - char print_fmt[32]; - const char *fmt; - - entry = (struct synth_trace_event *)iter->ent; - se = container_of(event, struct synth_event, call.event); - - trace_seq_printf(s, "%s: ", se->name); - - for (i = 0, n_u64 = 0; i < se->n_fields; i++) { - if (trace_seq_has_overflowed(s)) - goto end; - - fmt = synth_field_fmt(se->fields[i]->type); - - /* parameter types */ - if (tr && tr->trace_flags & TRACE_ITER_VERBOSE) - trace_seq_printf(s, "%s ", fmt); - - snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt); - - /* parameter values */ - if (se->fields[i]->is_string) { - trace_seq_printf(s, print_fmt, se->fields[i]->name, - (char *)&entry->fields[n_u64], - i == se->n_fields - 1 ? "" : " "); - n_u64 += STR_VAR_LEN_MAX / sizeof(u64); - } else { - struct trace_print_flags __flags[] = { - __def_gfpflag_names, {-1, NULL} }; - char *space = (i == se->n_fields - 1 ? "" : " "); - - print_synth_event_num_val(s, print_fmt, - se->fields[i]->name, - se->fields[i]->size, - entry->fields[n_u64], - space); - - if (strcmp(se->fields[i]->type, "gfp_t") == 0) { - trace_seq_puts(s, " ("); - trace_print_flags_seq(s, "|", - entry->fields[n_u64], - __flags); - trace_seq_putc(s, ')'); - } - n_u64++; - } - } -end: - trace_seq_putc(s, '\n'); - - return trace_handle_return(s); -} - -static struct trace_event_functions synth_event_funcs = { - .trace = print_synth_event -}; - -static notrace void trace_event_raw_event_synth(void *__data, - u64 *var_ref_vals, - unsigned int *var_ref_idx) -{ - struct trace_event_file *trace_file = __data; - struct synth_trace_event *entry; - struct trace_event_buffer fbuffer; - struct trace_buffer *buffer; - struct synth_event *event; - unsigned int i, n_u64, val_idx; - int fields_size = 0; - - event = trace_file->event_call->data; - - if (trace_trigger_soft_disabled(trace_file)) - return; - - fields_size = event->n_u64 * sizeof(u64); - - /* - * Avoid ring buffer recursion detection, as this event - * is being performed within another event. - */ - buffer = trace_file->tr->array_buffer.buffer; - ring_buffer_nest_start(buffer); - - entry = trace_event_buffer_reserve(&fbuffer, trace_file, - sizeof(*entry) + fields_size); - if (!entry) - goto out; - - for (i = 0, n_u64 = 0; i < event->n_fields; i++) { - val_idx = var_ref_idx[i]; - if (event->fields[i]->is_string) { - char *str_val = (char *)(long)var_ref_vals[val_idx]; - char *str_field = (char *)&entry->fields[n_u64]; - - strscpy(str_field, str_val, STR_VAR_LEN_MAX); - n_u64 += STR_VAR_LEN_MAX / sizeof(u64); - } else { - struct synth_field *field = event->fields[i]; - u64 val = var_ref_vals[val_idx]; - - switch (field->size) { - case 1: - *(u8 *)&entry->fields[n_u64] = (u8)val; - break; - - case 2: - *(u16 *)&entry->fields[n_u64] = (u16)val; - break; - - case 4: - *(u32 *)&entry->fields[n_u64] = (u32)val; - break; - - default: - entry->fields[n_u64] = val; - break; - } - n_u64++; - } - } - - trace_event_buffer_commit(&fbuffer); -out: - ring_buffer_nest_end(buffer); -} - -static void free_synth_event_print_fmt(struct trace_event_call *call) -{ - if (call) { - kfree(call->print_fmt); - call->print_fmt = NULL; - } -} - -static int __set_synth_event_print_fmt(struct synth_event *event, - char *buf, int len) -{ - const char *fmt; - int pos = 0; - int i; - - /* When len=0, we just calculate the needed length */ -#define LEN_OR_ZERO (len ? len - pos : 0) - - pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); - for (i = 0; i < event->n_fields; i++) { - fmt = synth_field_fmt(event->fields[i]->type); - pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s", - event->fields[i]->name, fmt, - i == event->n_fields - 1 ? "" : ", "); - } - pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); - - for (i = 0; i < event->n_fields; i++) { - pos += snprintf(buf + pos, LEN_OR_ZERO, - ", REC->%s", event->fields[i]->name); - } - -#undef LEN_OR_ZERO - - /* return the length of print_fmt */ - return pos; -} - -static int set_synth_event_print_fmt(struct trace_event_call *call) -{ - struct synth_event *event = call->data; - char *print_fmt; - int len; - - /* First: called with 0 length to calculate the needed length */ - len = __set_synth_event_print_fmt(event, NULL, 0); - - print_fmt = kmalloc(len + 1, GFP_KERNEL); - if (!print_fmt) - return -ENOMEM; - - /* Second: actually write the @print_fmt */ - __set_synth_event_print_fmt(event, print_fmt, len + 1); - call->print_fmt = print_fmt; - - return 0; -} - -static void free_synth_field(struct synth_field *field) -{ - kfree(field->type); - kfree(field->name); - kfree(field); -} - -static struct synth_field *parse_synth_field(int argc, const char **argv, - int *consumed) -{ - struct synth_field *field; - const char *prefix = NULL, *field_type = argv[0], *field_name, *array; - int len, ret = 0; - - if (field_type[0] == ';') - field_type++; - - if (!strcmp(field_type, "unsigned")) { - if (argc < 3) - return ERR_PTR(-EINVAL); - prefix = "unsigned "; - field_type = argv[1]; - field_name = argv[2]; - *consumed = 3; - } else { - field_name = argv[1]; - *consumed = 2; - } - - field = kzalloc(sizeof(*field), GFP_KERNEL); - if (!field) - return ERR_PTR(-ENOMEM); - - len = strlen(field_name); - array = strchr(field_name, '['); - if (array) - len -= strlen(array); - else if (field_name[len - 1] == ';') - len--; - - field->name = kmemdup_nul(field_name, len, GFP_KERNEL); - if (!field->name) { - ret = -ENOMEM; - goto free; - } - - if (field_type[0] == ';') - field_type++; - len = strlen(field_type) + 1; - if (array) - len += strlen(array); - if (prefix) - len += strlen(prefix); - - field->type = kzalloc(len, GFP_KERNEL); - if (!field->type) { - ret = -ENOMEM; - goto free; - } - if (prefix) - strcat(field->type, prefix); - strcat(field->type, field_type); - if (array) { - strcat(field->type, array); - if (field->type[len - 1] == ';') - field->type[len - 1] = '\0'; - } - - field->size = synth_field_size(field->type); - if (!field->size) { - ret = -EINVAL; - goto free; - } - - if (synth_field_is_string(field->type)) - field->is_string = true; - - field->is_signed = synth_field_signed(field->type); - - out: - return field; - free: - free_synth_field(field); - field = ERR_PTR(ret); - goto out; -} - -static void free_synth_tracepoint(struct tracepoint *tp) -{ - if (!tp) - return; - - kfree(tp->name); - kfree(tp); -} - -static struct tracepoint *alloc_synth_tracepoint(char *name) -{ - struct tracepoint *tp; - - tp = kzalloc(sizeof(*tp), GFP_KERNEL); - if (!tp) - return ERR_PTR(-ENOMEM); - - tp->name = kstrdup(name, GFP_KERNEL); - if (!tp->name) { - kfree(tp); - return ERR_PTR(-ENOMEM); - } - - return tp; -} - typedef void (*synth_probe_func_t) (void *__data, u64 *var_ref_vals, unsigned int *var_ref_idx); @@ -1177,145 +604,6 @@ static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals, } } -static struct synth_event *find_synth_event(const char *name) -{ - struct dyn_event *pos; - struct synth_event *event; - - for_each_dyn_event(pos) { - if (!is_synth_event(pos)) - continue; - event = to_synth_event(pos); - if (strcmp(event->name, name) == 0) - return event; - } - - return NULL; -} - -static struct trace_event_fields synth_event_fields_array[] = { - { .type = TRACE_FUNCTION_TYPE, - .define_fields = synth_event_define_fields }, - {} -}; - -static int register_synth_event(struct synth_event *event) -{ - struct trace_event_call *call = &event->call; - int ret = 0; - - event->call.class = &event->class; - event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL); - if (!event->class.system) { - ret = -ENOMEM; - goto out; - } - - event->tp = alloc_synth_tracepoint(event->name); - if (IS_ERR(event->tp)) { - ret = PTR_ERR(event->tp); - event->tp = NULL; - goto out; - } - - INIT_LIST_HEAD(&call->class->fields); - call->event.funcs = &synth_event_funcs; - call->class->fields_array = synth_event_fields_array; - - ret = register_trace_event(&call->event); - if (!ret) { - ret = -ENODEV; - goto out; - } - call->flags = TRACE_EVENT_FL_TRACEPOINT; - call->class->reg = trace_event_reg; - call->class->probe = trace_event_raw_event_synth; - call->data = event; - call->tp = event->tp; - - ret = trace_add_event_call(call); - if (ret) { - pr_warn("Failed to register synthetic event: %s\n", - trace_event_name(call)); - goto err; - } - - ret = set_synth_event_print_fmt(call); - if (ret < 0) { - trace_remove_event_call(call); - goto err; - } - out: - return ret; - err: - unregister_trace_event(&call->event); - goto out; -} - -static int unregister_synth_event(struct synth_event *event) -{ - struct trace_event_call *call = &event->call; - int ret; - - ret = trace_remove_event_call(call); - - return ret; -} - -static void free_synth_event(struct synth_event *event) -{ - unsigned int i; - - if (!event) - return; - - for (i = 0; i < event->n_fields; i++) - free_synth_field(event->fields[i]); - - kfree(event->fields); - kfree(event->name); - kfree(event->class.system); - free_synth_tracepoint(event->tp); - free_synth_event_print_fmt(&event->call); - kfree(event); -} - -static struct synth_event *alloc_synth_event(const char *name, int n_fields, - struct synth_field **fields) -{ - struct synth_event *event; - unsigned int i; - - event = kzalloc(sizeof(*event), GFP_KERNEL); - if (!event) { - event = ERR_PTR(-ENOMEM); - goto out; - } - - event->name = kstrdup(name, GFP_KERNEL); - if (!event->name) { - kfree(event); - event = ERR_PTR(-ENOMEM); - goto out; - } - - event->fields = kcalloc(n_fields, sizeof(*event->fields), GFP_KERNEL); - if (!event->fields) { - free_synth_event(event); - event = ERR_PTR(-ENOMEM); - goto out; - } - - dyn_event_init(&event->devent, &synth_event_ops); - - for (i = 0; i < n_fields; i++) - event->fields[i] = fields[i]; - - event->n_fields = n_fields; - out: - return event; -} - static void action_trace(struct hist_trigger_data *hist_data, struct tracing_map_elt *elt, void *rec, struct ring_buffer_event *rbe, void *key, @@ -1331,1056 +619,6 @@ struct hist_var_data { struct hist_trigger_data *hist_data; }; -static int synth_event_check_arg_fn(void *data) -{ - struct dynevent_arg_pair *arg_pair = data; - int size; - - size = synth_field_size((char *)arg_pair->lhs); - - return size ? 0 : -EINVAL; -} - -/** - * synth_event_add_field - Add a new field to a synthetic event cmd - * @cmd: A pointer to the dynevent_cmd struct representing the new event - * @type: The type of the new field to add - * @name: The name of the new field to add - * - * Add a new field to a synthetic event cmd object. Field ordering is in - * the same order the fields are added. - * - * See synth_field_size() for available types. If field_name contains - * [n] the field is considered to be an array. - * - * Return: 0 if successful, error otherwise. - */ -int synth_event_add_field(struct dynevent_cmd *cmd, const char *type, - const char *name) -{ - struct dynevent_arg_pair arg_pair; - int ret; - - if (cmd->type != DYNEVENT_TYPE_SYNTH) - return -EINVAL; - - if (!type || !name) - return -EINVAL; - - dynevent_arg_pair_init(&arg_pair, 0, ';'); - - arg_pair.lhs = type; - arg_pair.rhs = name; - - ret = dynevent_arg_pair_add(cmd, &arg_pair, synth_event_check_arg_fn); - if (ret) - return ret; - - if (++cmd->n_fields > SYNTH_FIELDS_MAX) - ret = -EINVAL; - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_add_field); - -/** - * synth_event_add_field_str - Add a new field to a synthetic event cmd - * @cmd: A pointer to the dynevent_cmd struct representing the new event - * @type_name: The type and name of the new field to add, as a single string - * - * Add a new field to a synthetic event cmd object, as a single - * string. The @type_name string is expected to be of the form 'type - * name', which will be appended by ';'. No sanity checking is done - - * what's passed in is assumed to already be well-formed. Field - * ordering is in the same order the fields are added. - * - * See synth_field_size() for available types. If field_name contains - * [n] the field is considered to be an array. - * - * Return: 0 if successful, error otherwise. - */ -int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name) -{ - struct dynevent_arg arg; - int ret; - - if (cmd->type != DYNEVENT_TYPE_SYNTH) - return -EINVAL; - - if (!type_name) - return -EINVAL; - - dynevent_arg_init(&arg, ';'); - - arg.str = type_name; - - ret = dynevent_arg_add(cmd, &arg, NULL); - if (ret) - return ret; - - if (++cmd->n_fields > SYNTH_FIELDS_MAX) - ret = -EINVAL; - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_add_field_str); - -/** - * synth_event_add_fields - Add multiple fields to a synthetic event cmd - * @cmd: A pointer to the dynevent_cmd struct representing the new event - * @fields: An array of type/name field descriptions - * @n_fields: The number of field descriptions contained in the fields array - * - * Add a new set of fields to a synthetic event cmd object. The event - * fields that will be defined for the event should be passed in as an - * array of struct synth_field_desc, and the number of elements in the - * array passed in as n_fields. Field ordering will retain the - * ordering given in the fields array. - * - * See synth_field_size() for available types. If field_name contains - * [n] the field is considered to be an array. - * - * Return: 0 if successful, error otherwise. - */ -int synth_event_add_fields(struct dynevent_cmd *cmd, - struct synth_field_desc *fields, - unsigned int n_fields) -{ - unsigned int i; - int ret = 0; - - for (i = 0; i < n_fields; i++) { - if (fields[i].type == NULL || fields[i].name == NULL) { - ret = -EINVAL; - break; - } - - ret = synth_event_add_field(cmd, fields[i].type, fields[i].name); - if (ret) - break; - } - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_add_fields); - -/** - * __synth_event_gen_cmd_start - Start a synthetic event command from arg list - * @cmd: A pointer to the dynevent_cmd struct representing the new event - * @name: The name of the synthetic event - * @mod: The module creating the event, NULL if not created from a module - * @args: Variable number of arg (pairs), one pair for each field - * - * NOTE: Users normally won't want to call this function directly, but - * rather use the synth_event_gen_cmd_start() wrapper, which - * automatically adds a NULL to the end of the arg list. If this - * function is used directly, make sure the last arg in the variable - * arg list is NULL. - * - * Generate a synthetic event command to be executed by - * synth_event_gen_cmd_end(). This function can be used to generate - * the complete command or only the first part of it; in the latter - * case, synth_event_add_field(), synth_event_add_field_str(), or - * synth_event_add_fields() can be used to add more fields following - * this. - * - * There should be an even number variable args, each pair consisting - * of a type followed by a field name. - * - * See synth_field_size() for available types. If field_name contains - * [n] the field is considered to be an array. - * - * Return: 0 if successful, error otherwise. - */ -int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name, - struct module *mod, ...) -{ - struct dynevent_arg arg; - va_list args; - int ret; - - cmd->event_name = name; - cmd->private_data = mod; - - if (cmd->type != DYNEVENT_TYPE_SYNTH) - return -EINVAL; - - dynevent_arg_init(&arg, 0); - arg.str = name; - ret = dynevent_arg_add(cmd, &arg, NULL); - if (ret) - return ret; - - va_start(args, mod); - for (;;) { - const char *type, *name; - - type = va_arg(args, const char *); - if (!type) - break; - name = va_arg(args, const char *); - if (!name) - break; - - if (++cmd->n_fields > SYNTH_FIELDS_MAX) { - ret = -EINVAL; - break; - } - - ret = synth_event_add_field(cmd, type, name); - if (ret) - break; - } - va_end(args); - - return ret; -} -EXPORT_SYMBOL_GPL(__synth_event_gen_cmd_start); - -/** - * synth_event_gen_cmd_array_start - Start synthetic event command from an array - * @cmd: A pointer to the dynevent_cmd struct representing the new event - * @name: The name of the synthetic event - * @fields: An array of type/name field descriptions - * @n_fields: The number of field descriptions contained in the fields array - * - * Generate a synthetic event command to be executed by - * synth_event_gen_cmd_end(). This function can be used to generate - * the complete command or only the first part of it; in the latter - * case, synth_event_add_field(), synth_event_add_field_str(), or - * synth_event_add_fields() can be used to add more fields following - * this. - * - * The event fields that will be defined for the event should be - * passed in as an array of struct synth_field_desc, and the number of - * elements in the array passed in as n_fields. Field ordering will - * retain the ordering given in the fields array. - * - * See synth_field_size() for available types. If field_name contains - * [n] the field is considered to be an array. - * - * Return: 0 if successful, error otherwise. - */ -int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name, - struct module *mod, - struct synth_field_desc *fields, - unsigned int n_fields) -{ - struct dynevent_arg arg; - unsigned int i; - int ret = 0; - - cmd->event_name = name; - cmd->private_data = mod; - - if (cmd->type != DYNEVENT_TYPE_SYNTH) - return -EINVAL; - - if (n_fields > SYNTH_FIELDS_MAX) - return -EINVAL; - - dynevent_arg_init(&arg, 0); - arg.str = name; - ret = dynevent_arg_add(cmd, &arg, NULL); - if (ret) - return ret; - - for (i = 0; i < n_fields; i++) { - if (fields[i].type == NULL || fields[i].name == NULL) - return -EINVAL; - - ret = synth_event_add_field(cmd, fields[i].type, fields[i].name); - if (ret) - break; - } - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_gen_cmd_array_start); - -static int __create_synth_event(int argc, const char *name, const char **argv) -{ - struct synth_field *field, *fields[SYNTH_FIELDS_MAX]; - struct synth_event *event = NULL; - int i, consumed = 0, n_fields = 0, ret = 0; - - /* - * Argument syntax: - * - Add synthetic event: <event_name> field[;field] ... - * - Remove synthetic event: !<event_name> field[;field] ... - * where 'field' = type field_name - */ - - if (name[0] == '\0' || argc < 1) - return -EINVAL; - - mutex_lock(&event_mutex); - - event = find_synth_event(name); - if (event) { - ret = -EEXIST; - goto out; - } - - for (i = 0; i < argc - 1; i++) { - if (strcmp(argv[i], ";") == 0) - continue; - if (n_fields == SYNTH_FIELDS_MAX) { - ret = -EINVAL; - goto err; - } - - field = parse_synth_field(argc - i, &argv[i], &consumed); - if (IS_ERR(field)) { - ret = PTR_ERR(field); - goto err; - } - fields[n_fields++] = field; - i += consumed - 1; - } - - if (i < argc && strcmp(argv[i], ";") != 0) { - ret = -EINVAL; - goto err; - } - - event = alloc_synth_event(name, n_fields, fields); - if (IS_ERR(event)) { - ret = PTR_ERR(event); - event = NULL; - goto err; - } - ret = register_synth_event(event); - if (!ret) - dyn_event_add(&event->devent); - else - free_synth_event(event); - out: - mutex_unlock(&event_mutex); - - return ret; - err: - for (i = 0; i < n_fields; i++) - free_synth_field(fields[i]); - - goto out; -} - -/** - * synth_event_create - Create a new synthetic event - * @name: The name of the new sythetic event - * @fields: An array of type/name field descriptions - * @n_fields: The number of field descriptions contained in the fields array - * @mod: The module creating the event, NULL if not created from a module - * - * Create a new synthetic event with the given name under the - * trace/events/synthetic/ directory. The event fields that will be - * defined for the event should be passed in as an array of struct - * synth_field_desc, and the number elements in the array passed in as - * n_fields. Field ordering will retain the ordering given in the - * fields array. - * - * If the new synthetic event is being created from a module, the mod - * param must be non-NULL. This will ensure that the trace buffer - * won't contain unreadable events. - * - * The new synth event should be deleted using synth_event_delete() - * function. The new synthetic event can be generated from modules or - * other kernel code using trace_synth_event() and related functions. - * - * Return: 0 if successful, error otherwise. - */ -int synth_event_create(const char *name, struct synth_field_desc *fields, - unsigned int n_fields, struct module *mod) -{ - struct dynevent_cmd cmd; - char *buf; - int ret; - - buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); - if (!buf) - return -ENOMEM; - - synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN); - - ret = synth_event_gen_cmd_array_start(&cmd, name, mod, - fields, n_fields); - if (ret) - goto out; - - ret = synth_event_gen_cmd_end(&cmd); - out: - kfree(buf); - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_create); - -static int destroy_synth_event(struct synth_event *se) -{ - int ret; - - if (se->ref) - ret = -EBUSY; - else { - ret = unregister_synth_event(se); - if (!ret) { - dyn_event_remove(&se->devent); - free_synth_event(se); - } - } - - return ret; -} - -/** - * synth_event_delete - Delete a synthetic event - * @event_name: The name of the new sythetic event - * - * Delete a synthetic event that was created with synth_event_create(). - * - * Return: 0 if successful, error otherwise. - */ -int synth_event_delete(const char *event_name) -{ - struct synth_event *se = NULL; - struct module *mod = NULL; - int ret = -ENOENT; - - mutex_lock(&event_mutex); - se = find_synth_event(event_name); - if (se) { - mod = se->mod; - ret = destroy_synth_event(se); - } - mutex_unlock(&event_mutex); - - if (mod) { - mutex_lock(&trace_types_lock); - /* - * It is safest to reset the ring buffer if the module - * being unloaded registered any events that were - * used. The only worry is if a new module gets - * loaded, and takes on the same id as the events of - * this module. When printing out the buffer, traced - * events left over from this module may be passed to - * the new module events and unexpected results may - * occur. - */ - tracing_reset_all_online_cpus(); - mutex_unlock(&trace_types_lock); - } - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_delete); - -static int create_or_delete_synth_event(int argc, char **argv) -{ - const char *name = argv[0]; - int ret; - - /* trace_run_command() ensures argc != 0 */ - if (name[0] == '!') { - ret = synth_event_delete(name + 1); - return ret; - } - - ret = __create_synth_event(argc - 1, name, (const char **)argv + 1); - return ret == -ECANCELED ? -EINVAL : ret; -} - -static int synth_event_run_command(struct dynevent_cmd *cmd) -{ - struct synth_event *se; - int ret; - - ret = trace_run_command(cmd->seq.buffer, create_or_delete_synth_event); - if (ret) - return ret; - - se = find_synth_event(cmd->event_name); - if (WARN_ON(!se)) - return -ENOENT; - - se->mod = cmd->private_data; - - return ret; -} - -/** - * synth_event_cmd_init - Initialize a synthetic event command object - * @cmd: A pointer to the dynevent_cmd struct representing the new event - * @buf: A pointer to the buffer used to build the command - * @maxlen: The length of the buffer passed in @buf - * - * Initialize a synthetic event command object. Use this before - * calling any of the other dyenvent_cmd functions. - */ -void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen) -{ - dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_SYNTH, - synth_event_run_command); -} -EXPORT_SYMBOL_GPL(synth_event_cmd_init); - -static inline int -__synth_event_trace_start(struct trace_event_file *file, - struct synth_event_trace_state *trace_state) -{ - int entry_size, fields_size = 0; - int ret = 0; - - memset(trace_state, '\0', sizeof(*trace_state)); - - /* - * Normal event tracing doesn't get called at all unless the - * ENABLED bit is set (which attaches the probe thus allowing - * this code to be called, etc). Because this is called - * directly by the user, we don't have that but we still need - * to honor not logging when disabled. For the the iterated - * trace case, we save the enabed state upon start and just - * ignore the following data calls. - */ - if (!(file->flags & EVENT_FILE_FL_ENABLED) || - trace_trigger_soft_disabled(file)) { - trace_state->disabled = true; - ret = -ENOENT; - goto out; - } - - trace_state->event = file->event_call->data; - - fields_size = trace_state->event->n_u64 * sizeof(u64); - - /* - * Avoid ring buffer recursion detection, as this event - * is being performed within another event. - */ - trace_state->buffer = file->tr->array_buffer.buffer; - ring_buffer_nest_start(trace_state->buffer); - - entry_size = sizeof(*trace_state->entry) + fields_size; - trace_state->entry = trace_event_buffer_reserve(&trace_state->fbuffer, - file, - entry_size); - if (!trace_state->entry) { - ring_buffer_nest_end(trace_state->buffer); - ret = -EINVAL; - } -out: - return ret; -} - -static inline void -__synth_event_trace_end(struct synth_event_trace_state *trace_state) -{ - trace_event_buffer_commit(&trace_state->fbuffer); - - ring_buffer_nest_end(trace_state->buffer); -} - -/** - * synth_event_trace - Trace a synthetic event - * @file: The trace_event_file representing the synthetic event - * @n_vals: The number of values in vals - * @args: Variable number of args containing the event values - * - * Trace a synthetic event using the values passed in the variable - * argument list. - * - * The argument list should be a list 'n_vals' u64 values. The number - * of vals must match the number of field in the synthetic event, and - * must be in the same order as the synthetic event fields. - * - * All vals should be cast to u64, and string vals are just pointers - * to strings, cast to u64. Strings will be copied into space - * reserved in the event for the string, using these pointers. - * - * Return: 0 on success, err otherwise. - */ -int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...) -{ - struct synth_event_trace_state state; - unsigned int i, n_u64; - va_list args; - int ret; - - ret = __synth_event_trace_start(file, &state); - if (ret) { - if (ret == -ENOENT) - ret = 0; /* just disabled, not really an error */ - return ret; - } - - if (n_vals != state.event->n_fields) { - ret = -EINVAL; - goto out; - } - - va_start(args, n_vals); - for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) { - u64 val; - - val = va_arg(args, u64); - - if (state.event->fields[i]->is_string) { - char *str_val = (char *)(long)val; - char *str_field = (char *)&state.entry->fields[n_u64]; - - strscpy(str_field, str_val, STR_VAR_LEN_MAX); - n_u64 += STR_VAR_LEN_MAX / sizeof(u64); - } else { - struct synth_field *field = state.event->fields[i]; - - switch (field->size) { - case 1: - *(u8 *)&state.entry->fields[n_u64] = (u8)val; - break; - - case 2: - *(u16 *)&state.entry->fields[n_u64] = (u16)val; - break; - - case 4: - *(u32 *)&state.entry->fields[n_u64] = (u32)val; - break; - - default: - state.entry->fields[n_u64] = val; - break; - } - n_u64++; - } - } - va_end(args); -out: - __synth_event_trace_end(&state); - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_trace); - -/** - * synth_event_trace_array - Trace a synthetic event from an array - * @file: The trace_event_file representing the synthetic event - * @vals: Array of values - * @n_vals: The number of values in vals - * - * Trace a synthetic event using the values passed in as 'vals'. - * - * The 'vals' array is just an array of 'n_vals' u64. The number of - * vals must match the number of field in the synthetic event, and - * must be in the same order as the synthetic event fields. - * - * All vals should be cast to u64, and string vals are just pointers - * to strings, cast to u64. Strings will be copied into space - * reserved in the event for the string, using these pointers. - * - * Return: 0 on success, err otherwise. - */ -int synth_event_trace_array(struct trace_event_file *file, u64 *vals, - unsigned int n_vals) -{ - struct synth_event_trace_state state; - unsigned int i, n_u64; - int ret; - - ret = __synth_event_trace_start(file, &state); - if (ret) { - if (ret == -ENOENT) - ret = 0; /* just disabled, not really an error */ - return ret; - } - - if (n_vals != state.event->n_fields) { - ret = -EINVAL; - goto out; - } - - for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) { - if (state.event->fields[i]->is_string) { - char *str_val = (char *)(long)vals[i]; - char *str_field = (char *)&state.entry->fields[n_u64]; - - strscpy(str_field, str_val, STR_VAR_LEN_MAX); - n_u64 += STR_VAR_LEN_MAX / sizeof(u64); - } else { - struct synth_field *field = state.event->fields[i]; - u64 val = vals[i]; - - switch (field->size) { - case 1: - *(u8 *)&state.entry->fields[n_u64] = (u8)val; - break; - - case 2: - *(u16 *)&state.entry->fields[n_u64] = (u16)val; - break; - - case 4: - *(u32 *)&state.entry->fields[n_u64] = (u32)val; - break; - - default: - state.entry->fields[n_u64] = val; - break; - } - n_u64++; - } - } -out: - __synth_event_trace_end(&state); - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_trace_array); - -/** - * synth_event_trace_start - Start piecewise synthetic event trace - * @file: The trace_event_file representing the synthetic event - * @trace_state: A pointer to object tracking the piecewise trace state - * - * Start the trace of a synthetic event field-by-field rather than all - * at once. - * - * This function 'opens' an event trace, which means space is reserved - * for the event in the trace buffer, after which the event's - * individual field values can be set through either - * synth_event_add_next_val() or synth_event_add_val(). - * - * A pointer to a trace_state object is passed in, which will keep - * track of the current event trace state until the event trace is - * closed (and the event finally traced) using - * synth_event_trace_end(). - * - * Note that synth_event_trace_end() must be called after all values - * have been added for each event trace, regardless of whether adding - * all field values succeeded or not. - * - * Note also that for a given event trace, all fields must be added - * using either synth_event_add_next_val() or synth_event_add_val() - * but not both together or interleaved. - * - * Return: 0 on success, err otherwise. - */ -int synth_event_trace_start(struct trace_event_file *file, - struct synth_event_trace_state *trace_state) -{ - int ret; - - if (!trace_state) - return -EINVAL; - - ret = __synth_event_trace_start(file, trace_state); - if (ret == -ENOENT) - ret = 0; /* just disabled, not really an error */ - - return ret; -} -EXPORT_SYMBOL_GPL(synth_event_trace_start); - -static int __synth_event_add_val(const char *field_name, u64 val, - struct synth_event_trace_state *trace_state) -{ - struct synth_field *field = NULL; - struct synth_trace_event *entry; - struct synth_event *event; - int i, ret = 0; - - if (!trace_state) { - ret = -EINVAL; - goto out; - } - - /* can't mix add_next_synth_val() with add_synth_val() */ - if (field_name) { - if (trace_state->add_next) { - ret = -EINVAL; - goto out; - } - trace_state->add_name = true; - } else { - if (trace_state->add_name) { - ret = -EINVAL; - goto out; - } - trace_state->add_next = true; - } - - if (trace_state->disabled) - goto out; - - event = trace_state->event; - if (trace_state->add_name) { - for (i = 0; i < event->n_fields; i++) { - field = event->fields[i]; - if (strcmp(field->name, field_name) == 0) - break; - } - if (!field) { - ret = -EINVAL; - goto out; - } - } else { - if (trace_state->cur_field >= event->n_fields) { - ret = -EINVAL; - goto out; - } - field = event->fields[trace_state->cur_field++]; - } - - entry = trace_state->entry; - if (field->is_string) { - char *str_val = (char *)(long)val; - char *str_field; - - if (!str_val) { - ret = -EINVAL; - goto out; - } - - str_field = (char *)&entry->fields[field->offset]; - strscpy(str_field, str_val, STR_VAR_LEN_MAX); - } else { - switch (field->size) { - case 1: - *(u8 *)&trace_state->entry->fields[field->offset] = (u8)val; - break; - - case 2: - *(u16 *)&trace_state->entry->fields[field->offset] = (u16)val; - break; - - case 4: - *(u32 *)&trace_state->entry->fields[field->offset] = (u32)val; - break; - - default: - trace_state->entry->fields[field->offset] = val; - break; - } - } - out: - return ret; -} - -/** - * synth_event_add_next_val - Add the next field's value to an open synth trace - * @val: The value to set the next field to - * @trace_state: A pointer to object tracking the piecewise trace state - * - * Set the value of the next field in an event that's been opened by - * synth_event_trace_start(). - * - * The val param should be the value cast to u64. If the value points - * to a string, the val param should be a char * cast to u64. - * - * This function assumes all the fields in an event are to be set one - * after another - successive calls to this function are made, one for - * each field, in the order of the fields in the event, until all - * fields have been set. If you'd rather set each field individually - * without regard to ordering, synth_event_add_val() can be used - * instead. - * - * Note however that synth_event_add_next_val() and - * synth_event_add_val() can't be intermixed for a given event trace - - * one or the other but not both can be used at the same time. - * - * Note also that synth_event_trace_end() must be called after all - * values have been added for each event trace, regardless of whether - * adding all field values succeeded or not. - * - * Return: 0 on success, err otherwise. - */ -int synth_event_add_next_val(u64 val, - struct synth_event_trace_state *trace_state) -{ - return __synth_event_add_val(NULL, val, trace_state); -} -EXPORT_SYMBOL_GPL(synth_event_add_next_val); - -/** - * synth_event_add_val - Add a named field's value to an open synth trace - * @field_name: The name of the synthetic event field value to set - * @val: The value to set the next field to - * @trace_state: A pointer to object tracking the piecewise trace state - * - * Set the value of the named field in an event that's been opened by - * synth_event_trace_start(). - * - * The val param should be the value cast to u64. If the value points - * to a string, the val param should be a char * cast to u64. - * - * This function looks up the field name, and if found, sets the field - * to the specified value. This lookup makes this function more - * expensive than synth_event_add_next_val(), so use that or the - * none-piecewise synth_event_trace() instead if efficiency is more - * important. - * - * Note however that synth_event_add_next_val() and - * synth_event_add_val() can't be intermixed for a given event trace - - * one or the other but not both can be used at the same time. - * - * Note also that synth_event_trace_end() must be called after all - * values have been added for each event trace, regardless of whether - * adding all field values succeeded or not. - * - * Return: 0 on success, err otherwise. - */ -int synth_event_add_val(const char *field_name, u64 val, - struct synth_event_trace_state *trace_state) -{ - return __synth_event_add_val(field_name, val, trace_state); -} -EXPORT_SYMBOL_GPL(synth_event_add_val); - -/** - * synth_event_trace_end - End piecewise synthetic event trace - * @trace_state: A pointer to object tracking the piecewise trace state - * - * End the trace of a synthetic event opened by - * synth_event_trace__start(). - * - * This function 'closes' an event trace, which basically means that - * it commits the reserved event and cleans up other loose ends. - * - * A pointer to a trace_state object is passed in, which will keep - * track of the current event trace state opened with - * synth_event_trace_start(). - * - * Note that this function must be called after all values have been - * added for each event trace, regardless of whether adding all field - * values succeeded or not. - * - * Return: 0 on success, err otherwise. - */ -int synth_event_trace_end(struct synth_event_trace_state *trace_state) -{ - if (!trace_state) - return -EINVAL; - - __synth_event_trace_end(trace_state); - - return 0; -} -EXPORT_SYMBOL_GPL(synth_event_trace_end); - -static int create_synth_event(int argc, const char **argv) -{ - const char *name = argv[0]; - int len; - - if (name[0] != 's' || name[1] != ':') - return -ECANCELED; - name += 2; - - /* This interface accepts group name prefix */ - if (strchr(name, '/')) { - len = str_has_prefix(name, SYNTH_SYSTEM "/"); - if (len == 0) - return -EINVAL; - name += len; - } - return __create_synth_event(argc - 1, name, argv + 1); -} - -static int synth_event_release(struct dyn_event *ev) -{ - struct synth_event *event = to_synth_event(ev); - int ret; - - if (event->ref) - return -EBUSY; - - ret = unregister_synth_event(event); - if (ret) - return ret; - - dyn_event_remove(ev); - free_synth_event(event); - return 0; -} - -static int __synth_event_show(struct seq_file *m, struct synth_event *event) -{ - struct synth_field *field; - unsigned int i; - - seq_printf(m, "%s\t", event->name); - - for (i = 0; i < event->n_fields; i++) { - field = event->fields[i]; - - /* parameter values */ - seq_printf(m, "%s %s%s", field->type, field->name, - i == event->n_fields - 1 ? "" : "; "); - } - - seq_putc(m, '\n'); - - return 0; -} - -static int synth_event_show(struct seq_file *m, struct dyn_event *ev) -{ - struct synth_event *event = to_synth_event(ev); - - seq_printf(m, "s:%s/", event->class.system); - - return __synth_event_show(m, event); -} - -static int synth_events_seq_show(struct seq_file *m, void *v) -{ - struct dyn_event *ev = v; - - if (!is_synth_event(ev)) - return 0; - - return __synth_event_show(m, to_synth_event(ev)); -} - -static const struct seq_operations synth_events_seq_op = { - .start = dyn_event_seq_start, - .next = dyn_event_seq_next, - .stop = dyn_event_seq_stop, - .show = synth_events_seq_show, -}; - -static int synth_events_open(struct inode *inode, struct file *file) -{ - int ret; - - ret = security_locked_down(LOCKDOWN_TRACEFS); - if (ret) - return ret; - - if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { - ret = dyn_events_release_all(&synth_event_ops); - if (ret < 0) - return ret; - } - - return seq_open(file, &synth_events_seq_op); -} - -static ssize_t synth_events_write(struct file *file, - const char __user *buffer, - size_t count, loff_t *ppos) -{ - return trace_parse_run_command(file, buffer, count, ppos, - create_or_delete_synth_event); -} - -static const struct file_operations synth_events_fops = { - .open = synth_events_open, - .write = synth_events_write, - .read = seq_read, - .llseek = seq_lseek, - .release = seq_release, -}; - static u64 hist_field_timestamp(struct hist_field *hist_field, struct tracing_map_elt *elt, struct ring_buffer_event *rbe, @@ -6491,6 +4729,279 @@ const struct file_operations event_hist_fops = { .release = single_release, }; +#ifdef CONFIG_HIST_TRIGGERS_DEBUG +static void hist_field_debug_show_flags(struct seq_file *m, + unsigned long flags) +{ + seq_puts(m, " flags:\n"); + + if (flags & HIST_FIELD_FL_KEY) + seq_puts(m, " HIST_FIELD_FL_KEY\n"); + else if (flags & HIST_FIELD_FL_HITCOUNT) + seq_puts(m, " VAL: HIST_FIELD_FL_HITCOUNT\n"); + else if (flags & HIST_FIELD_FL_VAR) + seq_puts(m, " HIST_FIELD_FL_VAR\n"); + else if (flags & HIST_FIELD_FL_VAR_REF) + seq_puts(m, " HIST_FIELD_FL_VAR_REF\n"); + else + seq_puts(m, " VAL: normal u64 value\n"); + + if (flags & HIST_FIELD_FL_ALIAS) + seq_puts(m, " HIST_FIELD_FL_ALIAS\n"); +} + +static int hist_field_debug_show(struct seq_file *m, + struct hist_field *field, unsigned long flags) +{ + if ((field->flags & flags) != flags) { + seq_printf(m, "ERROR: bad flags - %lx\n", flags); + return -EINVAL; + } + + hist_field_debug_show_flags(m, field->flags); + if (field->field) + seq_printf(m, " ftrace_event_field name: %s\n", + field->field->name); + + if (field->flags & HIST_FIELD_FL_VAR) { + seq_printf(m, " var.name: %s\n", field->var.name); + seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n", + field->var.idx); + } + + if (field->flags & HIST_FIELD_FL_ALIAS) + seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n", + field->var_ref_idx); + + if (field->flags & HIST_FIELD_FL_VAR_REF) { + seq_printf(m, " name: %s\n", field->name); + seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n", + field->var.idx); + seq_printf(m, " var.hist_data: %p\n", field->var.hist_data); + seq_printf(m, " var_ref_idx (into hist_data->var_refs[]): %u\n", + field->var_ref_idx); + if (field->system) + seq_printf(m, " system: %s\n", field->system); + if (field->event_name) + seq_printf(m, " event_name: %s\n", field->event_name); + } + + seq_printf(m, " type: %s\n", field->type); + seq_printf(m, " size: %u\n", field->size); + seq_printf(m, " is_signed: %u\n", field->is_signed); + + return 0; +} + +static int field_var_debug_show(struct seq_file *m, + struct field_var *field_var, unsigned int i, + bool save_vars) +{ + const char *vars_name = save_vars ? "save_vars" : "field_vars"; + struct hist_field *field; + int ret = 0; + + seq_printf(m, "\n hist_data->%s[%d]:\n", vars_name, i); + + field = field_var->var; + + seq_printf(m, "\n %s[%d].var:\n", vars_name, i); + + hist_field_debug_show_flags(m, field->flags); + seq_printf(m, " var.name: %s\n", field->var.name); + seq_printf(m, " var.idx (into tracing_map_elt.vars[]): %u\n", + field->var.idx); + + field = field_var->val; + + seq_printf(m, "\n %s[%d].val:\n", vars_name, i); + if (field->field) + seq_printf(m, " ftrace_event_field name: %s\n", + field->field->name); + else { + ret = -EINVAL; + goto out; + } + + seq_printf(m, " type: %s\n", field->type); + seq_printf(m, " size: %u\n", field->size); + seq_printf(m, " is_signed: %u\n", field->is_signed); +out: + return ret; +} + +static int hist_action_debug_show(struct seq_file *m, + struct action_data *data, int i) +{ + int ret = 0; + + if (data->handler == HANDLER_ONMAX || + data->handler == HANDLER_ONCHANGE) { + seq_printf(m, "\n hist_data->actions[%d].track_data.var_ref:\n", i); + ret = hist_field_debug_show(m, data->track_data.var_ref, + HIST_FIELD_FL_VAR_REF); + if (ret) + goto out; + + seq_printf(m, "\n hist_data->actions[%d].track_data.track_var:\n", i); + ret = hist_field_debug_show(m, data->track_data.track_var, + HIST_FIELD_FL_VAR); + if (ret) + goto out; + } + + if (data->handler == HANDLER_ONMATCH) { + seq_printf(m, "\n hist_data->actions[%d].match_data.event_system: %s\n", + i, data->match_data.event_system); + seq_printf(m, " hist_data->actions[%d].match_data.event: %s\n", + i, data->match_data.event); + } +out: + return ret; +} + +static int hist_actions_debug_show(struct seq_file *m, + struct hist_trigger_data *hist_data) +{ + int i, ret = 0; + + if (hist_data->n_actions) + seq_puts(m, "\n action tracking variables (for onmax()/onchange()/onmatch()):\n"); + + for (i = 0; i < hist_data->n_actions; i++) { + struct action_data *action = hist_data->actions[i]; + + ret = hist_action_debug_show(m, action, i); + if (ret) + goto out; + } + + if (hist_data->n_save_vars) + seq_puts(m, "\n save action variables (save() params):\n"); + + for (i = 0; i < hist_data->n_save_vars; i++) { + ret = field_var_debug_show(m, hist_data->save_vars[i], i, true); + if (ret) + goto out; + } +out: + return ret; +} + +static void hist_trigger_debug_show(struct seq_file *m, + struct event_trigger_data *data, int n) +{ + struct hist_trigger_data *hist_data; + int i, ret; + + if (n > 0) + seq_puts(m, "\n\n"); + + seq_puts(m, "# event histogram\n#\n# trigger info: "); + data->ops->print(m, data->ops, data); + seq_puts(m, "#\n\n"); + + hist_data = data->private_data; + + seq_printf(m, "hist_data: %p\n\n", hist_data); + seq_printf(m, " n_vals: %u\n", hist_data->n_vals); + seq_printf(m, " n_keys: %u\n", hist_data->n_keys); + seq_printf(m, " n_fields: %u\n", hist_data->n_fields); + + seq_puts(m, "\n val fields:\n\n"); + + seq_puts(m, " hist_data->fields[0]:\n"); + ret = hist_field_debug_show(m, hist_data->fields[0], + HIST_FIELD_FL_HITCOUNT); + if (ret) + return; + + for (i = 1; i < hist_data->n_vals; i++) { + seq_printf(m, "\n hist_data->fields[%d]:\n", i); + ret = hist_field_debug_show(m, hist_data->fields[i], 0); + if (ret) + return; + } + + seq_puts(m, "\n key fields:\n"); + + for (i = hist_data->n_vals; i < hist_data->n_fields; i++) { + seq_printf(m, "\n hist_data->fields[%d]:\n", i); + ret = hist_field_debug_show(m, hist_data->fields[i], + HIST_FIELD_FL_KEY); + if (ret) + return; + } + + if (hist_data->n_var_refs) + seq_puts(m, "\n variable reference fields:\n"); + + for (i = 0; i < hist_data->n_var_refs; i++) { + seq_printf(m, "\n hist_data->var_refs[%d]:\n", i); + ret = hist_field_debug_show(m, hist_data->var_refs[i], + HIST_FIELD_FL_VAR_REF); + if (ret) + return; + } + + if (hist_data->n_field_vars) + seq_puts(m, "\n field variables:\n"); + + for (i = 0; i < hist_data->n_field_vars; i++) { + ret = field_var_debug_show(m, hist_data->field_vars[i], i, false); + if (ret) + return; + } + + ret = hist_actions_debug_show(m, hist_data); + if (ret) + return; +} + +static int hist_debug_show(struct seq_file *m, void *v) +{ + struct event_trigger_data *data; + struct trace_event_file *event_file; + int n = 0, ret = 0; + + mutex_lock(&event_mutex); + + event_file = event_file_data(m->private); + if (unlikely(!event_file)) { + ret = -ENODEV; + goto out_unlock; + } + + list_for_each_entry(data, &event_file->triggers, list) { + if (data->cmd_ops->trigger_type == ETT_EVENT_HIST) + hist_trigger_debug_show(m, data, n++); + } + + out_unlock: + mutex_unlock(&event_mutex); + + return ret; +} + +static int event_hist_debug_open(struct inode *inode, struct file *file) +{ + int ret; + + ret = security_locked_down(LOCKDOWN_TRACEFS); + if (ret) + return ret; + + return single_open(file, hist_debug_show, file); +} + +const struct file_operations event_hist_debug_fops = { + .open = event_hist_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; +#endif + static void hist_field_print(struct seq_file *m, struct hist_field *hist_field) { const char *field_name = hist_field_name(hist_field, 0); @@ -7393,37 +5904,3 @@ __init int register_trigger_hist_enable_disable_cmds(void) return ret; } - -static __init int trace_events_hist_init(void) -{ - struct dentry *entry = NULL; - struct dentry *d_tracer; - int err = 0; - - err = dyn_event_register(&synth_event_ops); - if (err) { - pr_warn("Could not register synth_event_ops\n"); - return err; - } - - d_tracer = tracing_init_dentry(); - if (IS_ERR(d_tracer)) { - err = PTR_ERR(d_tracer); - goto err; - } - - entry = tracefs_create_file("synthetic_events", 0644, d_tracer, - NULL, &synth_events_fops); - if (!entry) { - err = -ENODEV; - goto err; - } - - return err; - err: - pr_warn("Could not create tracefs 'synthetic_events' entry\n"); - - return err; -} - -fs_initcall(trace_events_hist_init); diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c new file mode 100644 index 000000000000..c6cca0d1d584 --- /dev/null +++ b/kernel/trace/trace_events_synth.c @@ -0,0 +1,1789 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * trace_events_synth - synthetic trace events + * + * Copyright (C) 2015, 2020 Tom Zanussi <tom.zanussi@linux.intel.com> + */ + +#include <linux/module.h> +#include <linux/kallsyms.h> +#include <linux/security.h> +#include <linux/mutex.h> +#include <linux/slab.h> +#include <linux/stacktrace.h> +#include <linux/rculist.h> +#include <linux/tracefs.h> + +/* for gfp flag names */ +#include <linux/trace_events.h> +#include <trace/events/mmflags.h> + +#include "trace_synth.h" + +static int create_synth_event(int argc, const char **argv); +static int synth_event_show(struct seq_file *m, struct dyn_event *ev); +static int synth_event_release(struct dyn_event *ev); +static bool synth_event_is_busy(struct dyn_event *ev); +static bool synth_event_match(const char *system, const char *event, + int argc, const char **argv, struct dyn_event *ev); + +static struct dyn_event_operations synth_event_ops = { + .create = create_synth_event, + .show = synth_event_show, + .is_busy = synth_event_is_busy, + .free = synth_event_release, + .match = synth_event_match, +}; + +static bool is_synth_event(struct dyn_event *ev) +{ + return ev->ops == &synth_event_ops; +} + +static struct synth_event *to_synth_event(struct dyn_event *ev) +{ + return container_of(ev, struct synth_event, devent); +} + +static bool synth_event_is_busy(struct dyn_event *ev) +{ + struct synth_event *event = to_synth_event(ev); + + return event->ref != 0; +} + +static bool synth_event_match(const char *system, const char *event, + int argc, const char **argv, struct dyn_event *ev) +{ + struct synth_event *sev = to_synth_event(ev); + + return strcmp(sev->name, event) == 0 && + (!system || strcmp(system, SYNTH_SYSTEM) == 0); +} + +struct synth_trace_event { + struct trace_entry ent; + u64 fields[]; +}; + +static int synth_event_define_fields(struct trace_event_call *call) +{ + struct synth_trace_event trace; + int offset = offsetof(typeof(trace), fields); + struct synth_event *event = call->data; + unsigned int i, size, n_u64; + char *name, *type; + bool is_signed; + int ret = 0; + + for (i = 0, n_u64 = 0; i < event->n_fields; i++) { + size = event->fields[i]->size; + is_signed = event->fields[i]->is_signed; + type = event->fields[i]->type; + name = event->fields[i]->name; + ret = trace_define_field(call, type, name, offset, size, + is_signed, FILTER_OTHER); + if (ret) + break; + + event->fields[i]->offset = n_u64; + + if (event->fields[i]->is_string) { + offset += STR_VAR_LEN_MAX; + n_u64 += STR_VAR_LEN_MAX / sizeof(u64); + } else { + offset += sizeof(u64); + n_u64++; + } + } + + event->n_u64 = n_u64; + + return ret; +} + +static bool synth_field_signed(char *type) +{ + if (str_has_prefix(type, "u")) + return false; + if (strcmp(type, "gfp_t") == 0) + return false; + + return true; +} + +static int synth_field_is_string(char *type) +{ + if (strstr(type, "char[") != NULL) + return true; + + return false; +} + +static int synth_field_string_size(char *type) +{ + char buf[4], *end, *start; + unsigned int len; + int size, err; + + start = strstr(type, "char["); + if (start == NULL) + return -EINVAL; + start += sizeof("char[") - 1; + + end = strchr(type, ']'); + if (!end || end < start) + return -EINVAL; + + len = end - start; + if (len > 3) + return -EINVAL; + + strncpy(buf, start, len); + buf[len] = '\0'; + + err = kstrtouint(buf, 0, &size); + if (err) + return err; + + if (size > STR_VAR_LEN_MAX) + return -EINVAL; + + return size; +} + +static int synth_field_size(char *type) +{ + int size = 0; + + if (strcmp(type, "s64") == 0) + size = sizeof(s64); + else if (strcmp(type, "u64") == 0) + size = sizeof(u64); + else if (strcmp(type, "s32") == 0) + size = sizeof(s32); + else if (strcmp(type, "u32") == 0) + size = sizeof(u32); + else if (strcmp(type, "s16") == 0) + size = sizeof(s16); + else if (strcmp(type, "u16") == 0) + size = sizeof(u16); + else if (strcmp(type, "s8") == 0) + size = sizeof(s8); + else if (strcmp(type, "u8") == 0) + size = sizeof(u8); + else if (strcmp(type, "char") == 0) + size = sizeof(char); + else if (strcmp(type, "unsigned char") == 0) + size = sizeof(unsigned char); + else if (strcmp(type, "int") == 0) + size = sizeof(int); + else if (strcmp(type, "unsigned int") == 0) + size = sizeof(unsigned int); + else if (strcmp(type, "long") == 0) + size = sizeof(long); + else if (strcmp(type, "unsigned long") == 0) + size = sizeof(unsigned long); + else if (strcmp(type, "pid_t") == 0) + size = sizeof(pid_t); + else if (strcmp(type, "gfp_t") == 0) + size = sizeof(gfp_t); + else if (synth_field_is_string(type)) + size = synth_field_string_size(type); + + return size; +} + +static const char *synth_field_fmt(char *type) +{ + const char *fmt = "%llu"; + + if (strcmp(type, "s64") == 0) + fmt = "%lld"; + else if (strcmp(type, "u64") == 0) + fmt = "%llu"; + else if (strcmp(type, "s32") == 0) + fmt = "%d"; + else if (strcmp(type, "u32") == 0) + fmt = "%u"; + else if (strcmp(type, "s16") == 0) + fmt = "%d"; + else if (strcmp(type, "u16") == 0) + fmt = "%u"; + else if (strcmp(type, "s8") == 0) + fmt = "%d"; + else if (strcmp(type, "u8") == 0) + fmt = "%u"; + else if (strcmp(type, "char") == 0) + fmt = "%d"; + else if (strcmp(type, "unsigned char") == 0) + fmt = "%u"; + else if (strcmp(type, "int") == 0) + fmt = "%d"; + else if (strcmp(type, "unsigned int") == 0) + fmt = "%u"; + else if (strcmp(type, "long") == 0) + fmt = "%ld"; + else if (strcmp(type, "unsigned long") == 0) + fmt = "%lu"; + else if (strcmp(type, "pid_t") == 0) + fmt = "%d"; + else if (strcmp(type, "gfp_t") == 0) + fmt = "%x"; + else if (synth_field_is_string(type)) + fmt = "%s"; + + return fmt; +} + +static void print_synth_event_num_val(struct trace_seq *s, + char *print_fmt, char *name, + int size, u64 val, char *space) +{ + switch (size) { + case 1: + trace_seq_printf(s, print_fmt, name, (u8)val, space); + break; + + case 2: + trace_seq_printf(s, print_fmt, name, (u16)val, space); + break; + + case 4: + trace_seq_printf(s, print_fmt, name, (u32)val, space); + break; + + default: + trace_seq_printf(s, print_fmt, name, val, space); + break; + } +} + +static enum print_line_t print_synth_event(struct trace_iterator *iter, + int flags, + struct trace_event *event) +{ + struct trace_array *tr = iter->tr; + struct trace_seq *s = &iter->seq; + struct synth_trace_event *entry; + struct synth_event *se; + unsigned int i, n_u64; + char print_fmt[32]; + const char *fmt; + + entry = (struct synth_trace_event *)iter->ent; + se = container_of(event, struct synth_event, call.event); + + trace_seq_printf(s, "%s: ", se->name); + + for (i = 0, n_u64 = 0; i < se->n_fields; i++) { + if (trace_seq_has_overflowed(s)) + goto end; + + fmt = synth_field_fmt(se->fields[i]->type); + + /* parameter types */ + if (tr && tr->trace_flags & TRACE_ITER_VERBOSE) + trace_seq_printf(s, "%s ", fmt); + + snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt); + + /* parameter values */ + if (se->fields[i]->is_string) { + trace_seq_printf(s, print_fmt, se->fields[i]->name, + (char *)&entry->fields[n_u64], + i == se->n_fields - 1 ? "" : " "); + n_u64 += STR_VAR_LEN_MAX / sizeof(u64); + } else { + struct trace_print_flags __flags[] = { + __def_gfpflag_names, {-1, NULL} }; + char *space = (i == se->n_fields - 1 ? "" : " "); + + print_synth_event_num_val(s, print_fmt, + se->fields[i]->name, + se->fields[i]->size, + entry->fields[n_u64], + space); + + if (strcmp(se->fields[i]->type, "gfp_t") == 0) { + trace_seq_puts(s, " ("); + trace_print_flags_seq(s, "|", + entry->fields[n_u64], + __flags); + trace_seq_putc(s, ')'); + } + n_u64++; + } + } +end: + trace_seq_putc(s, '\n'); + + return trace_handle_return(s); +} + +static struct trace_event_functions synth_event_funcs = { + .trace = print_synth_event +}; + +static notrace void trace_event_raw_event_synth(void *__data, + u64 *var_ref_vals, + unsigned int *var_ref_idx) +{ + struct trace_event_file *trace_file = __data; + struct synth_trace_event *entry; + struct trace_event_buffer fbuffer; + struct trace_buffer *buffer; + struct synth_event *event; + unsigned int i, n_u64, val_idx; + int fields_size = 0; + + event = trace_file->event_call->data; + + if (trace_trigger_soft_disabled(trace_file)) + return; + + fields_size = event->n_u64 * sizeof(u64); + + /* + * Avoid ring buffer recursion detection, as this event + * is being performed within another event. + */ + buffer = trace_file->tr->array_buffer.buffer; + ring_buffer_nest_start(buffer); + + entry = trace_event_buffer_reserve(&fbuffer, trace_file, + sizeof(*entry) + fields_size); + if (!entry) + goto out; + + for (i = 0, n_u64 = 0; i < event->n_fields; i++) { + val_idx = var_ref_idx[i]; + if (event->fields[i]->is_string) { + char *str_val = (char *)(long)var_ref_vals[val_idx]; + char *str_field = (char *)&entry->fields[n_u64]; + + strscpy(str_field, str_val, STR_VAR_LEN_MAX); + n_u64 += STR_VAR_LEN_MAX / sizeof(u64); + } else { + struct synth_field *field = event->fields[i]; + u64 val = var_ref_vals[val_idx]; + + switch (field->size) { + case 1: + *(u8 *)&entry->fields[n_u64] = (u8)val; + break; + + case 2: + *(u16 *)&entry->fields[n_u64] = (u16)val; + break; + + case 4: + *(u32 *)&entry->fields[n_u64] = (u32)val; + break; + + default: + entry->fields[n_u64] = val; + break; + } + n_u64++; + } + } + + trace_event_buffer_commit(&fbuffer); +out: + ring_buffer_nest_end(buffer); +} + +static void free_synth_event_print_fmt(struct trace_event_call *call) +{ + if (call) { + kfree(call->print_fmt); + call->print_fmt = NULL; + } +} + +static int __set_synth_event_print_fmt(struct synth_event *event, + char *buf, int len) +{ + const char *fmt; + int pos = 0; + int i; + + /* When len=0, we just calculate the needed length */ +#define LEN_OR_ZERO (len ? len - pos : 0) + + pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); + for (i = 0; i < event->n_fields; i++) { + fmt = synth_field_fmt(event->fields[i]->type); + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s", + event->fields[i]->name, fmt, + i == event->n_fields - 1 ? "" : ", "); + } + pos += snprintf(buf + pos, LEN_OR_ZERO, "\""); + + for (i = 0; i < event->n_fields; i++) { + pos += snprintf(buf + pos, LEN_OR_ZERO, + ", REC->%s", event->fields[i]->name); + } + +#undef LEN_OR_ZERO + + /* return the length of print_fmt */ + return pos; +} + +static int set_synth_event_print_fmt(struct trace_event_call *call) +{ + struct synth_event *event = call->data; + char *print_fmt; + int len; + + /* First: called with 0 length to calculate the needed length */ + len = __set_synth_event_print_fmt(event, NULL, 0); + + print_fmt = kmalloc(len + 1, GFP_KERNEL); + if (!print_fmt) + return -ENOMEM; + + /* Second: actually write the @print_fmt */ + __set_synth_event_print_fmt(event, print_fmt, len + 1); + call->print_fmt = print_fmt; + + return 0; +} + +static void free_synth_field(struct synth_field *field) +{ + kfree(field->type); + kfree(field->name); + kfree(field); +} + +static struct synth_field *parse_synth_field(int argc, const char **argv, + int *consumed) +{ + struct synth_field *field; + const char *prefix = NULL, *field_type = argv[0], *field_name, *array; + int len, ret = 0; + + if (field_type[0] == ';') + field_type++; + + if (!strcmp(field_type, "unsigned")) { + if (argc < 3) + return ERR_PTR(-EINVAL); + prefix = "unsigned "; + field_type = argv[1]; + field_name = argv[2]; + *consumed = 3; + } else { + field_name = argv[1]; + *consumed = 2; + } + + field = kzalloc(sizeof(*field), GFP_KERNEL); + if (!field) + return ERR_PTR(-ENOMEM); + + len = strlen(field_name); + array = strchr(field_name, '['); + if (array) + len -= strlen(array); + else if (field_name[len - 1] == ';') + len--; + + field->name = kmemdup_nul(field_name, len, GFP_KERNEL); + if (!field->name) { + ret = -ENOMEM; + goto free; + } + + if (field_type[0] == ';') + field_type++; + len = strlen(field_type) + 1; + if (array) + len += strlen(array); + if (prefix) + len += strlen(prefix); + + field->type = kzalloc(len, GFP_KERNEL); + if (!field->type) { + ret = -ENOMEM; + goto free; + } + if (prefix) + strcat(field->type, prefix); + strcat(field->type, field_type); + if (array) { + strcat(field->type, array); + if (field->type[len - 1] == ';') + field->type[len - 1] = '\0'; + } + + field->size = synth_field_size(field->type); + if (!field->size) { + ret = -EINVAL; + goto free; + } + + if (synth_field_is_string(field->type)) + field->is_string = true; + + field->is_signed = synth_field_signed(field->type); + + out: + return field; + free: + free_synth_field(field); + field = ERR_PTR(ret); + goto out; +} + +static void free_synth_tracepoint(struct tracepoint *tp) +{ + if (!tp) + return; + + kfree(tp->name); + kfree(tp); +} + +static struct tracepoint *alloc_synth_tracepoint(char *name) +{ + struct tracepoint *tp; + + tp = kzalloc(sizeof(*tp), GFP_KERNEL); + if (!tp) + return ERR_PTR(-ENOMEM); + + tp->name = kstrdup(name, GFP_KERNEL); + if (!tp->name) { + kfree(tp); + return ERR_PTR(-ENOMEM); + } + + return tp; +} + +struct synth_event *find_synth_event(const char *name) +{ + struct dyn_event *pos; + struct synth_event *event; + + for_each_dyn_event(pos) { + if (!is_synth_event(pos)) + continue; + event = to_synth_event(pos); + if (strcmp(event->name, name) == 0) + return event; + } + + return NULL; +} + +static struct trace_event_fields synth_event_fields_array[] = { + { .type = TRACE_FUNCTION_TYPE, + .define_fields = synth_event_define_fields }, + {} +}; + +static int register_synth_event(struct synth_event *event) +{ + struct trace_event_call *call = &event->call; + int ret = 0; + + event->call.class = &event->class; + event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL); + if (!event->class.system) { + ret = -ENOMEM; + goto out; + } + + event->tp = alloc_synth_tracepoint(event->name); + if (IS_ERR(event->tp)) { + ret = PTR_ERR(event->tp); + event->tp = NULL; + goto out; + } + + INIT_LIST_HEAD(&call->class->fields); + call->event.funcs = &synth_event_funcs; + call->class->fields_array = synth_event_fields_array; + + ret = register_trace_event(&call->event); + if (!ret) { + ret = -ENODEV; + goto out; + } + call->flags = TRACE_EVENT_FL_TRACEPOINT; + call->class->reg = trace_event_reg; + call->class->probe = trace_event_raw_event_synth; + call->data = event; + call->tp = event->tp; + + ret = trace_add_event_call(call); + if (ret) { + pr_warn("Failed to register synthetic event: %s\n", + trace_event_name(call)); + goto err; + } + + ret = set_synth_event_print_fmt(call); + if (ret < 0) { + trace_remove_event_call(call); + goto err; + } + out: + return ret; + err: + unregister_trace_event(&call->event); + goto out; +} + +static int unregister_synth_event(struct synth_event *event) +{ + struct trace_event_call *call = &event->call; + int ret; + + ret = trace_remove_event_call(call); + + return ret; +} + +static void free_synth_event(struct synth_event *event) +{ + unsigned int i; + + if (!event) + return; + + for (i = 0; i < event->n_fields; i++) + free_synth_field(event->fields[i]); + + kfree(event->fields); + kfree(event->name); + kfree(event->class.system); + free_synth_tracepoint(event->tp); + free_synth_event_print_fmt(&event->call); + kfree(event); +} + +static struct synth_event *alloc_synth_event(const char *name, int n_fields, + struct synth_field **fields) +{ + struct synth_event *event; + unsigned int i; + + event = kzalloc(sizeof(*event), GFP_KERNEL); + if (!event) { + event = ERR_PTR(-ENOMEM); + goto out; + } + + event->name = kstrdup(name, GFP_KERNEL); + if (!event->name) { + kfree(event); + event = ERR_PTR(-ENOMEM); + goto out; + } + + event->fields = kcalloc(n_fields, sizeof(*event->fields), GFP_KERNEL); + if (!event->fields) { + free_synth_event(event); + event = ERR_PTR(-ENOMEM); + goto out; + } + + dyn_event_init(&event->devent, &synth_event_ops); + + for (i = 0; i < n_fields; i++) + event->fields[i] = fields[i]; + + event->n_fields = n_fields; + out: + return event; +} + +static int synth_event_check_arg_fn(void *data) +{ + struct dynevent_arg_pair *arg_pair = data; + int size; + + size = synth_field_size((char *)arg_pair->lhs); + + return size ? 0 : -EINVAL; +} + +/** + * synth_event_add_field - Add a new field to a synthetic event cmd + * @cmd: A pointer to the dynevent_cmd struct representing the new event + * @type: The type of the new field to add + * @name: The name of the new field to add + * + * Add a new field to a synthetic event cmd object. Field ordering is in + * the same order the fields are added. + * + * See synth_field_size() for available types. If field_name contains + * [n] the field is considered to be an array. + * + * Return: 0 if successful, error otherwise. + */ +int synth_event_add_field(struct dynevent_cmd *cmd, const char *type, + const char *name) +{ + struct dynevent_arg_pair arg_pair; + int ret; + + if (cmd->type != DYNEVENT_TYPE_SYNTH) + return -EINVAL; + + if (!type || !name) + return -EINVAL; + + dynevent_arg_pair_init(&arg_pair, 0, ';'); + + arg_pair.lhs = type; + arg_pair.rhs = name; + + ret = dynevent_arg_pair_add(cmd, &arg_pair, synth_event_check_arg_fn); + if (ret) + return ret; + + if (++cmd->n_fields > SYNTH_FIELDS_MAX) + ret = -EINVAL; + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_add_field); + +/** + * synth_event_add_field_str - Add a new field to a synthetic event cmd + * @cmd: A pointer to the dynevent_cmd struct representing the new event + * @type_name: The type and name of the new field to add, as a single string + * + * Add a new field to a synthetic event cmd object, as a single + * string. The @type_name string is expected to be of the form 'type + * name', which will be appended by ';'. No sanity checking is done - + * what's passed in is assumed to already be well-formed. Field + * ordering is in the same order the fields are added. + * + * See synth_field_size() for available types. If field_name contains + * [n] the field is considered to be an array. + * + * Return: 0 if successful, error otherwise. + */ +int synth_event_add_field_str(struct dynevent_cmd *cmd, const char *type_name) +{ + struct dynevent_arg arg; + int ret; + + if (cmd->type != DYNEVENT_TYPE_SYNTH) + return -EINVAL; + + if (!type_name) + return -EINVAL; + + dynevent_arg_init(&arg, ';'); + + arg.str = type_name; + + ret = dynevent_arg_add(cmd, &arg, NULL); + if (ret) + return ret; + + if (++cmd->n_fields > SYNTH_FIELDS_MAX) + ret = -EINVAL; + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_add_field_str); + +/** + * synth_event_add_fields - Add multiple fields to a synthetic event cmd + * @cmd: A pointer to the dynevent_cmd struct representing the new event + * @fields: An array of type/name field descriptions + * @n_fields: The number of field descriptions contained in the fields array + * + * Add a new set of fields to a synthetic event cmd object. The event + * fields that will be defined for the event should be passed in as an + * array of struct synth_field_desc, and the number of elements in the + * array passed in as n_fields. Field ordering will retain the + * ordering given in the fields array. + * + * See synth_field_size() for available types. If field_name contains + * [n] the field is considered to be an array. + * + * Return: 0 if successful, error otherwise. + */ +int synth_event_add_fields(struct dynevent_cmd *cmd, + struct synth_field_desc *fields, + unsigned int n_fields) +{ + unsigned int i; + int ret = 0; + + for (i = 0; i < n_fields; i++) { + if (fields[i].type == NULL || fields[i].name == NULL) { + ret = -EINVAL; + break; + } + + ret = synth_event_add_field(cmd, fields[i].type, fields[i].name); + if (ret) + break; + } + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_add_fields); + +/** + * __synth_event_gen_cmd_start - Start a synthetic event command from arg list + * @cmd: A pointer to the dynevent_cmd struct representing the new event + * @name: The name of the synthetic event + * @mod: The module creating the event, NULL if not created from a module + * @args: Variable number of arg (pairs), one pair for each field + * + * NOTE: Users normally won't want to call this function directly, but + * rather use the synth_event_gen_cmd_start() wrapper, which + * automatically adds a NULL to the end of the arg list. If this + * function is used directly, make sure the last arg in the variable + * arg list is NULL. + * + * Generate a synthetic event command to be executed by + * synth_event_gen_cmd_end(). This function can be used to generate + * the complete command or only the first part of it; in the latter + * case, synth_event_add_field(), synth_event_add_field_str(), or + * synth_event_add_fields() can be used to add more fields following + * this. + * + * There should be an even number variable args, each pair consisting + * of a type followed by a field name. + * + * See synth_field_size() for available types. If field_name contains + * [n] the field is considered to be an array. + * + * Return: 0 if successful, error otherwise. + */ +int __synth_event_gen_cmd_start(struct dynevent_cmd *cmd, const char *name, + struct module *mod, ...) +{ + struct dynevent_arg arg; + va_list args; + int ret; + + cmd->event_name = name; + cmd->private_data = mod; + + if (cmd->type != DYNEVENT_TYPE_SYNTH) + return -EINVAL; + + dynevent_arg_init(&arg, 0); + arg.str = name; + ret = dynevent_arg_add(cmd, &arg, NULL); + if (ret) + return ret; + + va_start(args, mod); + for (;;) { + const char *type, *name; + + type = va_arg(args, const char *); + if (!type) + break; + name = va_arg(args, const char *); + if (!name) + break; + + if (++cmd->n_fields > SYNTH_FIELDS_MAX) { + ret = -EINVAL; + break; + } + + ret = synth_event_add_field(cmd, type, name); + if (ret) + break; + } + va_end(args); + + return ret; +} +EXPORT_SYMBOL_GPL(__synth_event_gen_cmd_start); + +/** + * synth_event_gen_cmd_array_start - Start synthetic event command from an array + * @cmd: A pointer to the dynevent_cmd struct representing the new event + * @name: The name of the synthetic event + * @fields: An array of type/name field descriptions + * @n_fields: The number of field descriptions contained in the fields array + * + * Generate a synthetic event command to be executed by + * synth_event_gen_cmd_end(). This function can be used to generate + * the complete command or only the first part of it; in the latter + * case, synth_event_add_field(), synth_event_add_field_str(), or + * synth_event_add_fields() can be used to add more fields following + * this. + * + * The event fields that will be defined for the event should be + * passed in as an array of struct synth_field_desc, and the number of + * elements in the array passed in as n_fields. Field ordering will + * retain the ordering given in the fields array. + * + * See synth_field_size() for available types. If field_name contains + * [n] the field is considered to be an array. + * + * Return: 0 if successful, error otherwise. + */ +int synth_event_gen_cmd_array_start(struct dynevent_cmd *cmd, const char *name, + struct module *mod, + struct synth_field_desc *fields, + unsigned int n_fields) +{ + struct dynevent_arg arg; + unsigned int i; + int ret = 0; + + cmd->event_name = name; + cmd->private_data = mod; + + if (cmd->type != DYNEVENT_TYPE_SYNTH) + return -EINVAL; + + if (n_fields > SYNTH_FIELDS_MAX) + return -EINVAL; + + dynevent_arg_init(&arg, 0); + arg.str = name; + ret = dynevent_arg_add(cmd, &arg, NULL); + if (ret) + return ret; + + for (i = 0; i < n_fields; i++) { + if (fields[i].type == NULL || fields[i].name == NULL) + return -EINVAL; + + ret = synth_event_add_field(cmd, fields[i].type, fields[i].name); + if (ret) + break; + } + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_gen_cmd_array_start); + +static int __create_synth_event(int argc, const char *name, const char **argv) +{ + struct synth_field *field, *fields[SYNTH_FIELDS_MAX]; + struct synth_event *event = NULL; + int i, consumed = 0, n_fields = 0, ret = 0; + + /* + * Argument syntax: + * - Add synthetic event: <event_name> field[;field] ... + * - Remove synthetic event: !<event_name> field[;field] ... + * where 'field' = type field_name + */ + + if (name[0] == '\0' || argc < 1) + return -EINVAL; + + mutex_lock(&event_mutex); + + event = find_synth_event(name); + if (event) { + ret = -EEXIST; + goto out; + } + + for (i = 0; i < argc - 1; i++) { + if (strcmp(argv[i], ";") == 0) + continue; + if (n_fields == SYNTH_FIELDS_MAX) { + ret = -EINVAL; + goto err; + } + + field = parse_synth_field(argc - i, &argv[i], &consumed); + if (IS_ERR(field)) { + ret = PTR_ERR(field); + goto err; + } + fields[n_fields++] = field; + i += consumed - 1; + } + + if (i < argc && strcmp(argv[i], ";") != 0) { + ret = -EINVAL; + goto err; + } + + event = alloc_synth_event(name, n_fields, fields); + if (IS_ERR(event)) { + ret = PTR_ERR(event); + event = NULL; + goto err; + } + ret = register_synth_event(event); + if (!ret) + dyn_event_add(&event->devent); + else + free_synth_event(event); + out: + mutex_unlock(&event_mutex); + + return ret; + err: + for (i = 0; i < n_fields; i++) + free_synth_field(fields[i]); + + goto out; +} + +/** + * synth_event_create - Create a new synthetic event + * @name: The name of the new sythetic event + * @fields: An array of type/name field descriptions + * @n_fields: The number of field descriptions contained in the fields array + * @mod: The module creating the event, NULL if not created from a module + * + * Create a new synthetic event with the given name under the + * trace/events/synthetic/ directory. The event fields that will be + * defined for the event should be passed in as an array of struct + * synth_field_desc, and the number elements in the array passed in as + * n_fields. Field ordering will retain the ordering given in the + * fields array. + * + * If the new synthetic event is being created from a module, the mod + * param must be non-NULL. This will ensure that the trace buffer + * won't contain unreadable events. + * + * The new synth event should be deleted using synth_event_delete() + * function. The new synthetic event can be generated from modules or + * other kernel code using trace_synth_event() and related functions. + * + * Return: 0 if successful, error otherwise. + */ +int synth_event_create(const char *name, struct synth_field_desc *fields, + unsigned int n_fields, struct module *mod) +{ + struct dynevent_cmd cmd; + char *buf; + int ret; + + buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN); + + ret = synth_event_gen_cmd_array_start(&cmd, name, mod, + fields, n_fields); + if (ret) + goto out; + + ret = synth_event_gen_cmd_end(&cmd); + out: + kfree(buf); + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_create); + +static int destroy_synth_event(struct synth_event *se) +{ + int ret; + + if (se->ref) + ret = -EBUSY; + else { + ret = unregister_synth_event(se); + if (!ret) { + dyn_event_remove(&se->devent); + free_synth_event(se); + } + } + + return ret; +} + +/** + * synth_event_delete - Delete a synthetic event + * @event_name: The name of the new sythetic event + * + * Delete a synthetic event that was created with synth_event_create(). + * + * Return: 0 if successful, error otherwise. + */ +int synth_event_delete(const char *event_name) +{ + struct synth_event *se = NULL; + struct module *mod = NULL; + int ret = -ENOENT; + + mutex_lock(&event_mutex); + se = find_synth_event(event_name); + if (se) { + mod = se->mod; + ret = destroy_synth_event(se); + } + mutex_unlock(&event_mutex); + + if (mod) { + mutex_lock(&trace_types_lock); + /* + * It is safest to reset the ring buffer if the module + * being unloaded registered any events that were + * used. The only worry is if a new module gets + * loaded, and takes on the same id as the events of + * this module. When printing out the buffer, traced + * events left over from this module may be passed to + * the new module events and unexpected results may + * occur. + */ + tracing_reset_all_online_cpus(); + mutex_unlock(&trace_types_lock); + } + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_delete); + +static int create_or_delete_synth_event(int argc, char **argv) +{ + const char *name = argv[0]; + int ret; + + /* trace_run_command() ensures argc != 0 */ + if (name[0] == '!') { + ret = synth_event_delete(name + 1); + return ret; + } + + ret = __create_synth_event(argc - 1, name, (const char **)argv + 1); + return ret == -ECANCELED ? -EINVAL : ret; +} + +static int synth_event_run_command(struct dynevent_cmd *cmd) +{ + struct synth_event *se; + int ret; + + ret = trace_run_command(cmd->seq.buffer, create_or_delete_synth_event); + if (ret) + return ret; + + se = find_synth_event(cmd->event_name); + if (WARN_ON(!se)) + return -ENOENT; + + se->mod = cmd->private_data; + + return ret; +} + +/** + * synth_event_cmd_init - Initialize a synthetic event command object + * @cmd: A pointer to the dynevent_cmd struct representing the new event + * @buf: A pointer to the buffer used to build the command + * @maxlen: The length of the buffer passed in @buf + * + * Initialize a synthetic event command object. Use this before + * calling any of the other dyenvent_cmd functions. + */ +void synth_event_cmd_init(struct dynevent_cmd *cmd, char *buf, int maxlen) +{ + dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_SYNTH, + synth_event_run_command); +} +EXPORT_SYMBOL_GPL(synth_event_cmd_init); + +static inline int +__synth_event_trace_start(struct trace_event_file *file, + struct synth_event_trace_state *trace_state) +{ + int entry_size, fields_size = 0; + int ret = 0; + + memset(trace_state, '\0', sizeof(*trace_state)); + + /* + * Normal event tracing doesn't get called at all unless the + * ENABLED bit is set (which attaches the probe thus allowing + * this code to be called, etc). Because this is called + * directly by the user, we don't have that but we still need + * to honor not logging when disabled. For the the iterated + * trace case, we save the enabed state upon start and just + * ignore the following data calls. + */ + if (!(file->flags & EVENT_FILE_FL_ENABLED) || + trace_trigger_soft_disabled(file)) { + trace_state->disabled = true; + ret = -ENOENT; + goto out; + } + + trace_state->event = file->event_call->data; + + fields_size = trace_state->event->n_u64 * sizeof(u64); + + /* + * Avoid ring buffer recursion detection, as this event + * is being performed within another event. + */ + trace_state->buffer = file->tr->array_buffer.buffer; + ring_buffer_nest_start(trace_state->buffer); + + entry_size = sizeof(*trace_state->entry) + fields_size; + trace_state->entry = trace_event_buffer_reserve(&trace_state->fbuffer, + file, + entry_size); + if (!trace_state->entry) { + ring_buffer_nest_end(trace_state->buffer); + ret = -EINVAL; + } +out: + return ret; +} + +static inline void +__synth_event_trace_end(struct synth_event_trace_state *trace_state) +{ + trace_event_buffer_commit(&trace_state->fbuffer); + + ring_buffer_nest_end(trace_state->buffer); +} + +/** + * synth_event_trace - Trace a synthetic event + * @file: The trace_event_file representing the synthetic event + * @n_vals: The number of values in vals + * @args: Variable number of args containing the event values + * + * Trace a synthetic event using the values passed in the variable + * argument list. + * + * The argument list should be a list 'n_vals' u64 values. The number + * of vals must match the number of field in the synthetic event, and + * must be in the same order as the synthetic event fields. + * + * All vals should be cast to u64, and string vals are just pointers + * to strings, cast to u64. Strings will be copied into space + * reserved in the event for the string, using these pointers. + * + * Return: 0 on success, err otherwise. + */ +int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...) +{ + struct synth_event_trace_state state; + unsigned int i, n_u64; + va_list args; + int ret; + + ret = __synth_event_trace_start(file, &state); + if (ret) { + if (ret == -ENOENT) + ret = 0; /* just disabled, not really an error */ + return ret; + } + + if (n_vals != state.event->n_fields) { + ret = -EINVAL; + goto out; + } + + va_start(args, n_vals); + for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) { + u64 val; + + val = va_arg(args, u64); + + if (state.event->fields[i]->is_string) { + char *str_val = (char *)(long)val; + char *str_field = (char *)&state.entry->fields[n_u64]; + + strscpy(str_field, str_val, STR_VAR_LEN_MAX); + n_u64 += STR_VAR_LEN_MAX / sizeof(u64); + } else { + struct synth_field *field = state.event->fields[i]; + + switch (field->size) { + case 1: + *(u8 *)&state.entry->fields[n_u64] = (u8)val; + break; + + case 2: + *(u16 *)&state.entry->fields[n_u64] = (u16)val; + break; + + case 4: + *(u32 *)&state.entry->fields[n_u64] = (u32)val; + break; + + default: + state.entry->fields[n_u64] = val; + break; + } + n_u64++; + } + } + va_end(args); +out: + __synth_event_trace_end(&state); + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_trace); + +/** + * synth_event_trace_array - Trace a synthetic event from an array + * @file: The trace_event_file representing the synthetic event + * @vals: Array of values + * @n_vals: The number of values in vals + * + * Trace a synthetic event using the values passed in as 'vals'. + * + * The 'vals' array is just an array of 'n_vals' u64. The number of + * vals must match the number of field in the synthetic event, and + * must be in the same order as the synthetic event fields. + * + * All vals should be cast to u64, and string vals are just pointers + * to strings, cast to u64. Strings will be copied into space + * reserved in the event for the string, using these pointers. + * + * Return: 0 on success, err otherwise. + */ +int synth_event_trace_array(struct trace_event_file *file, u64 *vals, + unsigned int n_vals) +{ + struct synth_event_trace_state state; + unsigned int i, n_u64; + int ret; + + ret = __synth_event_trace_start(file, &state); + if (ret) { + if (ret == -ENOENT) + ret = 0; /* just disabled, not really an error */ + return ret; + } + + if (n_vals != state.event->n_fields) { + ret = -EINVAL; + goto out; + } + + for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) { + if (state.event->fields[i]->is_string) { + char *str_val = (char *)(long)vals[i]; + char *str_field = (char *)&state.entry->fields[n_u64]; + + strscpy(str_field, str_val, STR_VAR_LEN_MAX); + n_u64 += STR_VAR_LEN_MAX / sizeof(u64); + } else { + struct synth_field *field = state.event->fields[i]; + u64 val = vals[i]; + + switch (field->size) { + case 1: + *(u8 *)&state.entry->fields[n_u64] = (u8)val; + break; + + case 2: + *(u16 *)&state.entry->fields[n_u64] = (u16)val; + break; + + case 4: + *(u32 *)&state.entry->fields[n_u64] = (u32)val; + break; + + default: + state.entry->fields[n_u64] = val; + break; + } + n_u64++; + } + } +out: + __synth_event_trace_end(&state); + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_trace_array); + +/** + * synth_event_trace_start - Start piecewise synthetic event trace + * @file: The trace_event_file representing the synthetic event + * @trace_state: A pointer to object tracking the piecewise trace state + * + * Start the trace of a synthetic event field-by-field rather than all + * at once. + * + * This function 'opens' an event trace, which means space is reserved + * for the event in the trace buffer, after which the event's + * individual field values can be set through either + * synth_event_add_next_val() or synth_event_add_val(). + * + * A pointer to a trace_state object is passed in, which will keep + * track of the current event trace state until the event trace is + * closed (and the event finally traced) using + * synth_event_trace_end(). + * + * Note that synth_event_trace_end() must be called after all values + * have been added for each event trace, regardless of whether adding + * all field values succeeded or not. + * + * Note also that for a given event trace, all fields must be added + * using either synth_event_add_next_val() or synth_event_add_val() + * but not both together or interleaved. + * + * Return: 0 on success, err otherwise. + */ +int synth_event_trace_start(struct trace_event_file *file, + struct synth_event_trace_state *trace_state) +{ + int ret; + + if (!trace_state) + return -EINVAL; + + ret = __synth_event_trace_start(file, trace_state); + if (ret == -ENOENT) + ret = 0; /* just disabled, not really an error */ + + return ret; +} +EXPORT_SYMBOL_GPL(synth_event_trace_start); + +static int __synth_event_add_val(const char *field_name, u64 val, + struct synth_event_trace_state *trace_state) +{ + struct synth_field *field = NULL; + struct synth_trace_event *entry; + struct synth_event *event; + int i, ret = 0; + + if (!trace_state) { + ret = -EINVAL; + goto out; + } + + /* can't mix add_next_synth_val() with add_synth_val() */ + if (field_name) { + if (trace_state->add_next) { + ret = -EINVAL; + goto out; + } + trace_state->add_name = true; + } else { + if (trace_state->add_name) { + ret = -EINVAL; + goto out; + } + trace_state->add_next = true; + } + + if (trace_state->disabled) + goto out; + + event = trace_state->event; + if (trace_state->add_name) { + for (i = 0; i < event->n_fields; i++) { + field = event->fields[i]; + if (strcmp(field->name, field_name) == 0) + break; + } + if (!field) { + ret = -EINVAL; + goto out; + } + } else { + if (trace_state->cur_field >= event->n_fields) { + ret = -EINVAL; + goto out; + } + field = event->fields[trace_state->cur_field++]; + } + + entry = trace_state->entry; + if (field->is_string) { + char *str_val = (char *)(long)val; + char *str_field; + + if (!str_val) { + ret = -EINVAL; + goto out; + } + + str_field = (char *)&entry->fields[field->offset]; + strscpy(str_field, str_val, STR_VAR_LEN_MAX); + } else { + switch (field->size) { + case 1: + *(u8 *)&trace_state->entry->fields[field->offset] = (u8)val; + break; + + case 2: + *(u16 *)&trace_state->entry->fields[field->offset] = (u16)val; + break; + + case 4: + *(u32 *)&trace_state->entry->fields[field->offset] = (u32)val; + break; + + default: + trace_state->entry->fields[field->offset] = val; + break; + } + } + out: + return ret; +} + +/** + * synth_event_add_next_val - Add the next field's value to an open synth trace + * @val: The value to set the next field to + * @trace_state: A pointer to object tracking the piecewise trace state + * + * Set the value of the next field in an event that's been opened by + * synth_event_trace_start(). + * + * The val param should be the value cast to u64. If the value points + * to a string, the val param should be a char * cast to u64. + * + * This function assumes all the fields in an event are to be set one + * after another - successive calls to this function are made, one for + * each field, in the order of the fields in the event, until all + * fields have been set. If you'd rather set each field individually + * without regard to ordering, synth_event_add_val() can be used + * instead. + * + * Note however that synth_event_add_next_val() and + * synth_event_add_val() can't be intermixed for a given event trace - + * one or the other but not both can be used at the same time. + * + * Note also that synth_event_trace_end() must be called after all + * values have been added for each event trace, regardless of whether + * adding all field values succeeded or not. + * + * Return: 0 on success, err otherwise. + */ +int synth_event_add_next_val(u64 val, + struct synth_event_trace_state *trace_state) +{ + return __synth_event_add_val(NULL, val, trace_state); +} +EXPORT_SYMBOL_GPL(synth_event_add_next_val); + +/** + * synth_event_add_val - Add a named field's value to an open synth trace + * @field_name: The name of the synthetic event field value to set + * @val: The value to set the next field to + * @trace_state: A pointer to object tracking the piecewise trace state + * + * Set the value of the named field in an event that's been opened by + * synth_event_trace_start(). + * + * The val param should be the value cast to u64. If the value points + * to a string, the val param should be a char * cast to u64. + * + * This function looks up the field name, and if found, sets the field + * to the specified value. This lookup makes this function more + * expensive than synth_event_add_next_val(), so use that or the + * none-piecewise synth_event_trace() instead if efficiency is more + * important. + * + * Note however that synth_event_add_next_val() and + * synth_event_add_val() can't be intermixed for a given event trace - + * one or the other but not both can be used at the same time. + * + * Note also that synth_event_trace_end() must be called after all + * values have been added for each event trace, regardless of whether + * adding all field values succeeded or not. + * + * Return: 0 on success, err otherwise. + */ +int synth_event_add_val(const char *field_name, u64 val, + struct synth_event_trace_state *trace_state) +{ + return __synth_event_add_val(field_name, val, trace_state); +} +EXPORT_SYMBOL_GPL(synth_event_add_val); + +/** + * synth_event_trace_end - End piecewise synthetic event trace + * @trace_state: A pointer to object tracking the piecewise trace state + * + * End the trace of a synthetic event opened by + * synth_event_trace__start(). + * + * This function 'closes' an event trace, which basically means that + * it commits the reserved event and cleans up other loose ends. + * + * A pointer to a trace_state object is passed in, which will keep + * track of the current event trace state opened with + * synth_event_trace_start(). + * + * Note that this function must be called after all values have been + * added for each event trace, regardless of whether adding all field + * values succeeded or not. + * + * Return: 0 on success, err otherwise. + */ +int synth_event_trace_end(struct synth_event_trace_state *trace_state) +{ + if (!trace_state) + return -EINVAL; + + __synth_event_trace_end(trace_state); + + return 0; +} +EXPORT_SYMBOL_GPL(synth_event_trace_end); + +static int create_synth_event(int argc, const char **argv) +{ + const char *name = argv[0]; + int len; + + if (name[0] != 's' || name[1] != ':') + return -ECANCELED; + name += 2; + + /* This interface accepts group name prefix */ + if (strchr(name, '/')) { + len = str_has_prefix(name, SYNTH_SYSTEM "/"); + if (len == 0) + return -EINVAL; + name += len; + } + return __create_synth_event(argc - 1, name, argv + 1); +} + +static int synth_event_release(struct dyn_event *ev) +{ + struct synth_event *event = to_synth_event(ev); + int ret; + + if (event->ref) + return -EBUSY; + + ret = unregister_synth_event(event); + if (ret) + return ret; + + dyn_event_remove(ev); + free_synth_event(event); + return 0; +} + +static int __synth_event_show(struct seq_file *m, struct synth_event *event) +{ + struct synth_field *field; + unsigned int i; + + seq_printf(m, "%s\t", event->name); + + for (i = 0; i < event->n_fields; i++) { + field = event->fields[i]; + + /* parameter values */ + seq_printf(m, "%s %s%s", field->type, field->name, + i == event->n_fields - 1 ? "" : "; "); + } + + seq_putc(m, '\n'); + + return 0; +} + +static int synth_event_show(struct seq_file *m, struct dyn_event *ev) +{ + struct synth_event *event = to_synth_event(ev); + + seq_printf(m, "s:%s/", event->class.system); + + return __synth_event_show(m, event); +} + +static int synth_events_seq_show(struct seq_file *m, void *v) +{ + struct dyn_event *ev = v; + + if (!is_synth_event(ev)) + return 0; + + return __synth_event_show(m, to_synth_event(ev)); +} + +static const struct seq_operations synth_events_seq_op = { + .start = dyn_event_seq_start, + .next = dyn_event_seq_next, + .stop = dyn_event_seq_stop, + .show = synth_events_seq_show, +}; + +static int synth_events_open(struct inode *inode, struct file *file) +{ + int ret; + + ret = security_locked_down(LOCKDOWN_TRACEFS); + if (ret) + return ret; + + if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { + ret = dyn_events_release_all(&synth_event_ops); + if (ret < 0) + return ret; + } + + return seq_open(file, &synth_events_seq_op); +} + +static ssize_t synth_events_write(struct file *file, + const char __user *buffer, + size_t count, loff_t *ppos) +{ + return trace_parse_run_command(file, buffer, count, ppos, + create_or_delete_synth_event); +} + +static const struct file_operations synth_events_fops = { + .open = synth_events_open, + .write = synth_events_write, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +static __init int trace_events_synth_init(void) +{ + struct dentry *entry = NULL; + struct dentry *d_tracer; + int err = 0; + + err = dyn_event_register(&synth_event_ops); + if (err) { + pr_warn("Could not register synth_event_ops\n"); + return err; + } + + d_tracer = tracing_init_dentry(); + if (IS_ERR(d_tracer)) { + err = PTR_ERR(d_tracer); + goto err; + } + + entry = tracefs_create_file("synthetic_events", 0644, d_tracer, + NULL, &synth_events_fops); + if (!entry) { + err = -ENODEV; + goto err; + } + + return err; + err: + pr_warn("Could not create tracefs 'synthetic_events' entry\n"); + + return err; +} + +fs_initcall(trace_events_synth_init); diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index ab8b6436d53f..b8a928e925c7 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -1006,7 +1006,7 @@ int trace_probe_init(struct trace_probe *tp, const char *event, INIT_LIST_HEAD(&tp->event->class.fields); INIT_LIST_HEAD(&tp->event->probes); INIT_LIST_HEAD(&tp->list); - list_add(&tp->event->probes, &tp->list); + list_add(&tp->list, &tp->event->probes); call = trace_probe_event_call(tp); call->class = &tp->event->class; diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h new file mode 100644 index 000000000000..ac35c45207c4 --- /dev/null +++ b/kernel/trace/trace_synth.h @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef __TRACE_SYNTH_H +#define __TRACE_SYNTH_H + +#include "trace_dynevent.h" + +#define SYNTH_SYSTEM "synthetic" +#define SYNTH_FIELDS_MAX 32 + +#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */ + +struct synth_field { + char *type; + char *name; + size_t size; + unsigned int offset; + bool is_signed; + bool is_string; +}; + +struct synth_event { + struct dyn_event devent; + int ref; + char *name; + struct synth_field **fields; + unsigned int n_fields; + unsigned int n_u64; + struct trace_event_class class; + struct trace_event_call call; + struct tracepoint *tp; + struct module *mod; +}; + +extern struct synth_event *find_synth_event(const char *name); + +#endif /* __TRACE_SYNTH_H */ diff --git a/lib/bug.c b/lib/bug.c index 8c98af0bf585..7103440c0ee1 100644 --- a/lib/bug.c +++ b/lib/bug.c @@ -47,6 +47,7 @@ #include <linux/bug.h> #include <linux/sched.h> #include <linux/rculist.h> +#include <linux/ftrace.h> extern struct bug_entry __start___bug_table[], __stop___bug_table[]; @@ -153,6 +154,8 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) if (!bug) return BUG_TRAP_TYPE_NONE; + disable_trace_on_warning(); + file = NULL; line = 0; warning = 0; diff --git a/tools/bootconfig/test-bootconfig.sh b/tools/bootconfig/test-bootconfig.sh index 81b350ffd03f..eff16b77d5eb 100755 --- a/tools/bootconfig/test-bootconfig.sh +++ b/tools/bootconfig/test-bootconfig.sh @@ -124,9 +124,16 @@ for i in samples/good-* ; do xpass $BOOTCONF -a $i $INITRD done + +echo +echo "=== Summary ===" +echo "# of Passed: $(expr $NO - $NG - 1)" +echo "# of Failed: $NG" + echo if [ $NG -eq 0 ]; then echo "All tests passed" else echo "$NG tests failed" + exit 1 fi diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc index 77be6e1f6e7b..e232059a8ab2 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-field-variable-support.tc @@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + echo "Test field variable support" echo 'wakeup_latency u64 lat; pid_t pid; int prio; char comm[16]' > synthetic_events diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc index f3eb8aacec0e..07cfcb8157b6 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-inter-event-combined-hist.tc @@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + echo "Test create synthetic event" echo 'waking_latency u64 lat pid_t pid' > synthetic_events diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc index d281f056f980..73e413c2ca26 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-multi-actions-accept.tc @@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + echo "Test multiple actions on hist trigger" echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events TRIGGER1=events/sched/sched_wakeup/trigger diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc index a708f0e7858a..ebe0ad827f9f 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-action-hist.tc @@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + echo "Test create synthetic event" echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc index dfce6932d8be..2a2ef767249e 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmatch-onmax-action-hist.tc @@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + echo "Test create synthetic event" echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc index 0035995c2194..98d73bfb0296 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-onmax-action-hist.tc @@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + echo "Test create synthetic event" echo 'wakeup_latency u64 lat pid_t pid char comm[16]' > synthetic_events diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc index f546c1b66a9b..01b01b9c4e07 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-snapshot-action-hist.tc @@ -12,6 +12,11 @@ if [ ! -f set_event ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + if [ ! -f snapshot ]; then echo "snapshot is not supported" exit_unsupported diff --git a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc index 8021d60aafec..c3baa486aeb4 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/inter-event/trigger-trace-action-hist.tc @@ -17,6 +17,11 @@ if [ ! -f synthetic_events ]; then exit_unsupported fi +if [ ! -f events/sched/sched_process_fork/hist ]; then + echo "hist trigger is not supported" + exit_unsupported +fi + grep -q "trace(<synthetic_event>" README || exit_unsupported # version issue echo "Test create synthetic event" |