Manual browser: sysctl_locate(9)

Section:
Page:
SYSCTL(9) Kernel Developer's Manual SYSCTL(9)

NAME

sysctlsystem variable control interfaces

SYNOPSIS

#include <sys/param.h>
#include <sys/sysctl.h>

Primary external interfaces:
void
sysctl_init(void);

int
sysctl_lock(struct lwp *l, void *oldp, size_t savelen);

int
sysctl_dispatch(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

void
sysctl_unlock(struct lwp *l);

int
sysctl_createv(struct sysctllog **log, int cflags, const struct sysctlnode **rnode, const struct sysctlnode **cnode, int flags, int type, const char *namep, const char *desc, sysctlfn func, u_quad_t qv, void *newp, size_t newlen, ...);

int
sysctl_destroyv(struct sysctlnode *rnode, ...);

void
sysctl_free(struct sysctlnode *rnode);

void
sysctl_teardown(struct sysctllog **);

int
old_sysctl(int *name, u_int namelen, void *oldp, size_t *oldlenp, void *newp, size_t newlen, struct lwp *l);

Core internal functions:
int
sysctl_locate(struct lwp *l, const int *name, u_int namelen, const struct sysctlnode **rnode, int *nip);

int
sysctl_lookup(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

int
sysctl_create(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

int
sysctl_destroy(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

int
sysctl_query(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

Simple “helper” functions:
int
sysctl_needfunc(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

int
sysctl_notavail(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

int
sysctl_null(const int *name, u_int namelen, void *oldp, size_t *oldlenp, const void *newp, size_t newlen, const int *oname, struct lwp *l, const struct sysctlnode *rnode);

DESCRIPTION

The SYSCTL subsystem instruments a number of kernel tunables and other data structures via a simple MIB-like interface, primarily for consumption by userland programs, but also for use internally by the kernel.

LOCKING

All operations on the SYSCTL tree must be protected by acquiring the main SYSCTL lock. The only functions that can be called when the lock is not held are sysctl_lock(), sysctl_createv(), sysctl_destroyv(), and old_sysctl(). All other functions require the tree to be locked. This is to prevent other users of the tree from moving nodes around during an add operation, or from destroying nodes or subtrees that are actively being used. The lock is acquired by calling sysctl_lock() with a pointer to the process's lwp l (NULL may be passed to all functions as the lwp pointer if no lwp is appropriate, though any changes made via sysctl_create(), sysctl_destroy(), sysctl_lookup(), or by any helper function will be done with effective superuser privileges).

The oldp and savelen arguments are a pointer to and the size of the memory region the caller will be using to collect data from SYSCTL. These may also be NULL and 0, respectively.

The memory region will be locked via uvm_vslock() if it is a region in userspace. The address and size of the region are recorded so that when the SYSCTL lock is to be released via sysctl_unlock(), only the lwp pointer l is required.

LOOKUPS

Once the lock has been acquired, it is typical to call sysctl_dispatch() to handle the request. sysctl_dispatch() will examine the contents of name, an array of integers at least namelen long, which is to be located in kernel space, in order to determine which function to call to handle the specific request.

The following algorithm is used by sysctl_dispatch() to determine the function to call:

  • Scan the tree using sysctl_locate().
  • If the node returned has a “helper” function, call it.
  • If the requested node was found but has no function, call sysctl_lookup().
  • If the node was not found and name specifies one of sysctl_query(), sysctl_create(), or sysctl_destroy(), call the appropriate function.
  • If none of these options applies and no other error was yet recorded, return EOPNOTSUPP.

The oldp and oldlenp arguments to sysctl_dispatch(), as with all the other core functions, describe an area into which the current or requested value may be copied. oldp may or may not be a pointer into userspace (as dictated by whether l is NULL or not). oldlenp is a non-NULL pointer to a size_t. newp and newlen describe an area where the new value for the request may be found; newp may also be a pointer into userspace. The oname argument is a non-NULL pointer to the base of the request currently being processed. By simple arithmetic on name, namelen, and oname, one can easily determine the entire original request and namelen values, if needed. The rnode value, as passed to sysctl_dispatch() represents the root of the tree into which the current request is to be dispatched. If NULL, the main tree will be used.

The sysctl_locate() function scans a tree for the node most specific to a request. If the pointer referenced by rnode is not NULL, the tree indicated is searched, otherwise the main tree will be used. The address of the most relevant node will be returned via rnode and the number of MIB entries consumed will be returned via nip, if it is not NULL.

The sysctl_lookup() function takes the same arguments as sysctl_dispatch() with the caveat that the value for namelen must be zero in order to indicate that the node referenced by the rnode argument is the one to which the lookup is being applied.

CREATION AND DESTRUCTION OF NODES

New nodes are created and destroyed by the sysctl_create() and sysctl_destroy() functions. These functions take the same arguments as sysctl_dispatch() with the additional requirement that the namelen argument must be 1 and the name argument must point to an integer valued either CTL_CREATE or CTL_CREATESYM when creating a new node, or CTL_DESTROY when destroying a node.

The newp and newlen arguments should point to a copy of the node to be created or destroyed. If the create or destroy operation was successful, a copy of the node created or destroyed will be placed in the space indicated by oldp and oldlenp. If the create operation fails because of a conflict with an existing node, a copy of that node will be returned instead.

In order to facilitate the creation and destruction of nodes from a given tree by kernel subsystems, the functions sysctl_createv() and sysctl_destroyv() are provided. These functions take care of the overhead of filling in the contents of the create or destroy request, dealing with locking, locating the appropriate parent node, etc.

The arguments to sysctl_createv() are used to construct the new node. If the log argument is not NULL, a sysctllog structure will be allocated and the pointer referenced will be changed to address it. The same log may be used for any number of nodes, provided they are all inserted into the same tree. This allows for a series of nodes to be created and later removed from the tree in a single transaction (via sysctl_teardown()) without the need for any record keeping on the caller's part.

The cflags argument is currently unused and must be zero. The rnode argument must either be NULL or a valid pointer to a reference to the root of the tree into which the new node must be placed. If it is NULL, the main tree will be used. It is illegal for rnode to refer to a NULL pointer. If the cnode argument is not NULL, on return it will be adjusted to point to the address of the new node.

The flags and type arguments are combined into the sysctl_flags field, and the current value for SYSCTL_VERSION is added in. The following types are defined:

CTLTYPE_NODE
A node intended to be a parent for other nodes.
CTLTYPE_INT
A signed integer.
CTLTYPE_STRING
A NUL-terminated string.
CTLTYPE_QUAD
An unsigned 64-bit integer.
CTLTYPE_STRUCT
A structure.
CTLTYPE_BOOL
A boolean.

The namep argument is copied into the sysctl_name field and must be less than SYSCTL_NAMELEN characters in length. The string indicated by desc will be copied if the CTLFLAG_OWNDESC flag is set, and will be used as the node's description.

Two additional remarks:

  1. The CTLFLAG_PERMANENT flag can only be set from SYSCTL setup routines (see SETUP FUNCTIONS) as called by sysctl_init().
  2. If sysctl_destroyv() attempts to delete a node that does not own its own description (and is not marked as permanent), but the deletion fails, the description will be copied and sysctl_destroyv() will set the CTLFLAG_OWNDESC flag.

The func argument is the name of a “helper” function (see HELPER FUNCTIONS AND MACROS). If the CTLFLAG_IMMEDIATE flag is set, the qv argument will be interpreted as the initial value for the new “bool”, “int” or “quad” node. This flag does not apply to any other type of node. The newp and newlen arguments describe the data external to SYSCTL that is to be instrumented. One of func, qv and the CTLFLAG_IMMEDIATE flag, or newp and newlen must be given for nodes that instrument data, otherwise an error is returned.

The remaining arguments are a list of integers specifying the path through the MIB to the node being created. The list must be terminated by the CTL_EOL value. The penultimate value in the list may be CTL_CREATE if a dynamic MIB entry is to be made for this node. sysctl_createv() specifically does not support CTL_CREATESYM, since setup routines are expected to be able to use the in-kernel ksyms(4) interface to discover the location of the data to be instrumented. If the node to be created matches a node that already exists, a return code of 0 is given, indicating success.

When using sysctl_destroyv() to destroy a given node, the rnode argument, if not NULL, is taken to be the root of the tree from which the node is to be destroyed, otherwise the main tree is used. The rest of the arguments are a list of integers specifying the path through the MIB to the node being destroyed. If the node being destroyed does not exist, a successful return code is given. Nodes marked with the CTLFLAG_PERMANENT flag cannot be destroyed.

HELPER FUNCTIONS AND MACROS

Helper functions are invoked with the same common argument set as sysctl_dispatch() except that the rnode argument will never be NULL. It will be set to point to the node that corresponds most closely to the current request. Helpers are forbidden from modifying the node they are passed; they should instead copy the structure if changes are required in order to effect access control or other checks. The “helper” prototype and function that needs to ensure that a newly assigned value is within a certain range (presuming external data) would look like the following:

static int sysctl_helper(SYSCTLFN_PROTO); 
 
static int 
sysctl_helper(SYSCTLFN_ARGS) 
{ 
	struct sysctlnode node; 
	int t, error; 
 
	t = *(int *)rnode->sysctl_data; 
 
	node = *rnode; 
	node.sysctl_data = &t; 
	error = sysctl_lookup(SYSCTLFN_CALL(&node)); 
	if (error || newp == NULL) 
		return (error); 
 
	if (t < 0 || t > 20) 
		return (EINVAL); 
 
	*(int *)rnode->sysctl_data = t; 
	return (0); 
}

The use of the SYSCTLFN_PROTO, SYSCTLFN_ARGS, and SYSCTLFN_CALL
macros ensure that all arguments are passed properly. The single argument to the SYSCTLFN_CALL macro is the pointer to the node being examined.

Three basic helper functions are available for use. sysctl_needfunc() will emit a warning to the system console whenever it is invoked and provides a simplistic read-only interface to the given node. sysctl_notavail() will forward “queries” to sysctl_query() so that subtrees can be discovered, but will return EOPNOTSUPP for any other condition. sysctl_null() specifically ignores any arguments given, sets the value indicated by oldlenp to zero, and returns success.

SETUP FUNCTIONS

Though nodes can be added to the SYSCTL tree at any time, in order to add nodes during the kernel bootstrap phase, a proper “setup” function must be used. Setup functions are declared using the SYSCTL_SETUP macro, which takes the name of the function and a short string description of the function as arguments. (See the SYSCTL_DEBUG_SETUP kernel configuration in options(4).) The address of the function is added to a list of functions that sysctl_init() traverses during initialization.

Setup functions do not have to add nodes to the main tree, but can set up their own trees for emulation or other purposes. Emulations that require use of a main tree but with some nodes changed to suit their own purposes can arrange to overlay a sparse private tree onto their main tree by making the e_sysctlovly member of their struct emul definition point to the overlaid tree.

Setup functions should take care to create all nodes from the root down to the subtree they are creating, since the order in which setup functions are called is arbitrary (the order in which setup functions are called is only determined by the ordering of the object files as passed to the linker when the kernel is built).

MISCELLANEOUS FUNCTIONS

sysctl_init() is called early in the kernel bootstrap process. It initializes the SYSCTL lock, calls all the registered setup functions, and marks the tree as permanent.

sysctl_free() will unconditionally delete any and all nodes below the given node. Its intended use is for the deletion of entire trees, not subtrees. If a subtree is to be removed, sysctl_destroy() or sysctl_destroyv() should be used to ensure that nodes not owned by the sub-system being deactivated are not mistakenly destroyed. The SYSCTL lock must be held when calling this function.

sysctl_teardown() unwinds a sysctllog and deletes the nodes in the opposite order in which they were created.

old_sysctl() provides an interface similar to the old SYSCTL implementation, with the exception that access checks on a per-node basis are performed if the l argument is non-NULL. If called with a NULL argument, the values for newp and oldp are interpreted as kernel addresses, and access is performed as for the superuser.

NOTES

It is expected that nodes will be added to (or removed from) the tree during the following stages of a machine's lifetime:

  • initialization -- when the kernel is booting
  • autoconfiguration -- when devices are being probed at boot time
  • “plug and play” device attachment -- when a PC-Card, USB, or other device is plugged in or attached
  • module initialization -- when a module is being loaded
  • “run-time” -- when a process creates a node via the sysctl(3) interface

Nodes marked with CTLFLAG_PERMANENT can only be added to a tree during the first or initialization phase, and can never be removed. The initialization phase terminates when the main tree's root is marked with the CTLFLAG_PERMANENT flag. Once the main tree is marked in this manner, no nodes can be added to any tree that is marked with CTLFLAG_READONLY at its root, and no nodes can be added at all if the main tree's root is so marked.

Nodes added by device drivers, modules, and at device insertion time can be added to (and removed from) “read-only” parent nodes.

Nodes created by processes can only be added to “writable” parent nodes. See sysctl(3) for a description of the flags that are allowed to be used by when creating nodes.

SEE ALSO

sysctl(3)

HISTORY

The dynamic SYSCTL implementation first appeared in NetBSD 2.0.

AUTHORS

Andrew Brown <atatat@NetBSD.org> designed and implemented the dynamic SYSCTL implementation.
December 4, 2011 NetBSD 7.0