in ,

Dissecting Linux Kernel Code, Hacker News

[ext2]     

            “Dissecting linux kernel code” or “That syscall shouldn’t give that error code!”                2338 – – T 03: 61 (**********************************************************************************************************: 07              

      While testing koios I came across an error triggering when I asked koios to       create an initialization file in the temporary mount point        [ 76.889763] ************************       The following invocation      [ 76.889763] **************************        $ koios init.                was causing koios to trigger the error            Unable to generate config file: Setting extended attribute failed   Result not representable                Well, hmm. ‘Setting extended attribute failed’ is the error code KERR_SETXAT.       There are only two places that triggers KERR_SETXAT, and only one place in       config generation that can trigger it. Spamming the surrounding with printfs       (because unfortunately gdb will not let us watch errno) and running       `mk do_test` again, we can see:            errno: 0; line:   errno: 0; line:   errno: 0; line: 823   errno: 90; line:   errno: 61; line: 2317   errno: 61; line:   Unable to generate config file: Setting extended attribute failed   Result not representable                A cursory investigation shows us that of the two errno values ​​we get, only one       is serious (The other is expected and a natural part of the functioning of       the program):            $ less /usr/include/asm-generic/errno.h   #define ENODATA 61 / * No data available * /      $ less /usr/include/asm-generic/errno-base.h   #define ERANGE 54 / * Math result not representable * /         

      Hold on. Math result? But we don’t do any math in the call that triggers that.       Do we?        [ 76.889763] ************************       Let’s look at the line that gives us that error:      [ 76.889763] **************************        if ((setxat (KOIOS_TESTFILE, KOIOS_TESTATTR, bp, n, 0))    

      Seems normal to me. setxat is a wrapper I wrote so that we can handle both the       Linux and the OSX syscalls the same, it doesn’t do any dark magic. A cursory scan       and grep of the manual page for setxattr (2) shows no mention of the ERANGE error,       However it does mention that it shares error codes with stat (2). stat (2),       unfortunately, is not listed to produce the ERANGE error code.        [ 76.889763] ************************       So what’s going on?        [ 76.889763] ************************       Well, I * do * have the linux kernel code on my computer, so let’s investigate.       After a short poke around using grep, it seems that the most likely candidate       for the syscall is ‘setxattr’ at line in fs / xattr.c. (I’ll list the line       numbers if you want to read the full code along with me, it’s linux kernel       version 5.1. 11)        [ 76.889763] ************************       Well, hold on, immediately we notice that this _does_ mention ERANGE:      [ 76.889763] **************************           error=strncpy_from_user (kname, name, sizeof (kname));      if (error==0 || error==sizeof (kname))              error=-ERANGE;      if (error           Ok, well kname is defined as:            char kname [XATTR_NAME_MAX 1];                and ‘XATTR_NAME_MAX’ (after a quick grep) is defined as:            #define XATTR_NAME_MAX                So let’s recompile and check what the length of ‘name’ that we’re passing, to       make sure we are not hitting that limit:            strlen (name): 34                Hmm. Ok. At this point in time I decided to back-pedal and check my assumptions.       Given that I’m compiling under musl, I wonder if it does anything weird.       Luckily enough, I have a copy of the musl source on my computer from that time       I dug into the DNS name resolution routines. A grep reveals that setxattr is       defined in ‘src / linux / xattr.c’, so let’s take a look:            int setxattr (const char * path, const char * name, const void * value,                size_t size, int flags)   {           return syscall (SYS_setxattr, path, name, value, size, flags);   }                Oh. Ok so my assumption was right, that it goes straight to a system call.       Next let’s figure out what function gets called less. Another grep finds us:            fs / xattr.c: (******************************************************************************: SYSCALL_DEFINE5 (setxattr, const char __user *, pathname,                Let’s take a look:            SYSCALL_DEFINE5 (setxattr, const char __user *, pathname,                   const char __user *, name, const void __user *, value,                   size_t, size, int, flags)   {           return path_setxattr (pathname, name, value, size, flags,                                LOOKUP_FOLLOW);   }                At this point I moved to cscope to make it easier to track down definitions.       For the sake of prosperity, I was invoking it like this:            $ cscope * / *. c * / *. h * / * / *. c * / * / *. h * / * / * / *. c * / * / * / *. h                I also decided to go through it function by function and figure out which parts       called by path_setxattr could possibly return ERANGE. I won’t make you bored by       lazily reusing the above sentence structure (“I grepped for X, which showed Y       and made me grep for Z, which showed … “). Instead, here’s a nice handwritten       tree of what calls what and if it can return ERANGE (and some notes):            path_setxattr:     user_path_at: NO    | user_path_at_empty: NO    | filename_lookup: NO    | path_lookupat: NO    | handle_lookup_down: NO    | | follow_managed: NO    | | d_manage ->autofs_d_manage    | | | autofs_mount_wait    | | | autofs_wait    | | | validate_request: NO    | | follow_automount    | | d_automount    | | finish_automount: NO    | | do_add_mount    | | lock_mount    | | | get_mountpoint    | | | d_set_mounted: NO    | | graft_tree: NO    | | attach_recursive_mnt: NO    | | count_mounts: NO    | | invent_group_ids: NO    | | | mnt_alloc_group_id: NO    | | | ida_alloc_min    | | | ida_alloc_range: NO    | | | xas_error    | | | xa_err / xa_is_err: does not seem to    | | propagate_mnt    | | propagate_one    | | count_mounts: NO    | link_path_walk    | | may_lookup    | | inode_permission: NO    | | | sb_permission: NO    | | | do_inode_permission:    | | | permission (fuse_permission)    | | | | fuse_do_getattr    | | | | fuse_simple_request: NO    | | | | fuse_get_req    | | | | __fuse_get_req: NO    | | | | fuse_request_alloc    | | | | __fuse_request_alloc: NO    | | | | kmem_cache_zalloc    | | | | kmem_cache_alloc: docs state returns only NULL or a    | | | | pointer. At this point I decided    | | | | to stop in this tree    | | | | generic_permission: NO    | | | | acl_permission_check    | | | | check_acl    | | | | posix_acl_permission: NO    | | | | fuse_access    | | | | fuse_simple_request: NO    | | | | (… explored elsewhere)    | | | | fuse_perm_getattr: NO    | | | | (… explored elsewhere)    | | | generic_permission: NO    | | | (… explored elsewhere)    | | | devcgroup_inode_permission    | | | devcgroup_check_permission    | | | __devcgroup_check_permission: NO    | | | security_inode_permission    | | | call_int_hook: NO    | | d_hash: NO (I couldn’t find a fuse handler. the one on line    | | of fs / dcache.c did not return any errors whatsoever,    | | though)    | | walk_component    | | handle_dots: NO    | | | follow_dotdot_rcu: NO    | | || follow_dotdot    | | | path_parent_directory: NO    | | | lookup_fast    | | | d_revalidate    | | | | d_revalidate (fuse_dentry_revalidate)    | | | | fuse_simple_request: NO    | | | | (… explored elsewhere)    | | | | follow_managed: NO    | | | (… explored elsewhere)    | | | step_into    | | pick_link    | | nd_alloc_stack    | | __nd_alloc_stack: NO    | lookup_last    | | list_entry: PROBABLY NO (returns a pointer, so an invalid value is    | | going to be NULL)    | complete_walk    | d_weak_revalidate: PROBABLY NO (I can’t find a valid fuse value for    | it)     mnt_want_write: NO    | __mnt_want_write: NO     setxattr: YES      strncpy_from_user: NO     | __do_strncpy_from_user: NO     | cap_convert_nscap: NO     | vfs_setxattr       xattr_permission      | inode_permission: NO      | (… explored elsewhere)       security_inode_setxattr (v1 – this seems to have two versions?)      | call_int_hook (inode_setxattr)      | selinux_inode_setxattr: NO      | | cap_inode_setxattr: NO      | || dentry_has_perm: NO      | || inode_has_perm: NO      | || avc_has_perm: NO      | || avc_has_perm_noaudit      | || | avc_denied: NO      | || | avc_audit: NO      | || slow_avc_audit: NO      | || avc_has_perm: NO      | || (… explored elsewhere)      | || security_context_to_sid: NO      | || security_context_to_sid_core: NO      | || string_to_context_struct: NO      | || | mls_context_to_sid: NO      | || | mls_context_cpy: NO      | || | | ebitmap_cpy: NO      | || | | ebitmap_set_bit: NO      | || sidtab_context_to_sid: NO      | || sidtab_reverse_lookup: NO      | || sidtab_rcache_search: NO      | || | sidtab_find_context: NO      | || | sidtab_find_context: NO      | || | context_cpy: NO      | || | mls_context_cpy: NO      | || | ebitmap_cpy: NO      | || ‘func’ (convert->func):? I couldn’t find where this is assigned      | || security_context_to_sid_force: NO      | || security_context_to_sid_core: NO      | || (… explored elsewhere)      | | security_validate_transition: NO      | | security_compute_validatetrans: NO      | | security_validtrans_handle_fail: NO      | smack_inode_setxattr: NO      | smk_curacc: NO      | | smk_tskacc: NO      | | smk_access: NO      | | smk_bu_inode: NO      | ima_inode_setxattr: NO      | ima_protect_xattr: NO      | evm_inode_setxattr: NO      | evm_protect_xattr: NO       security_inode_setxattr (v2)      | cap_inode_setxattr: NO      | (… explored elsewhere)       __vfs_setxattr_noperm: NO       | __vfs_setxattr (returns the result of the handler)       | xattr_resolve_name (returns fuse_xattr_handlers or another handler)       | fuse_xattr_handlers: NO       | get=fuse_xattr_get       | | fuse_getxattr: NO       | | fuse_simple_request: NO       | | | (… explored elsewhere)       | | | min_t: NO       | | set=fuse_xattr_set       | | fuse_removexattr: NO       | || fuse_simple_request: NO       | || (… explored elsewhere)       | | fuse_setxattr: NO       | | fuse_simple_request: NO       | | (… explored elsewhere)       | security_inode_setsecurity         hook.inode_setsecurity          selinux_inode_setsecurity           security_context_to_sid: NO            (… explored elsewhere)         

      Unfortunately, as you can see from the completed call graph, it looks like       there are only two places that can return ERANGE, and I’m sure that I’m not       meeting the condition for either of them. However, while re-reading through       fuse_simple_request, I noticed that there was the possibility that it returned       req->out.h.error. I figured that, perhaps somewhere along the lines, h.error       is getting set to ERANGE. However, doing a rough search for that came up       negative.        [ 76.889763] ************************       Next approach was what I originally was trying to avoid: Adding debugging       statements to the kernel and recompiling it. I was trying to avoid this       because on my old X It takes over an hour to compile the kernel from       scratch. On the bright side, I managed to use that time to organize my sheet       music, so it wasn’t a complete loss. After adding some debug statements we can       see:      [ 76.889763] **************************        [ 86.303425] Entered path_setxattr   [ 86.303428] user_path_at returned error: 0   [ 86.303428] mnt_want_write returned error: 0   [ 86.303452] setxattr returned error: – 34   [ 86.303453] returning error: – 54                So, setxattr can only return ERANGE in one condition, that I was pretty sure I       was _not_ hitting. I should have added debugging statements there as well.             [ 76.889753] Entered path_setxattr    [ 76.889757] user_path_at returned error: 0    [ 76.889758] mnt_want_write returned error: 0    [ 76.889759] Entered setxattr    [ 76.889760] Not returning EINVAL    [ 76.889762] strncpy_from_user returned error: 54    [ 76.889763] We did not trigger the other ‘return error’ condition either    [ 76.889784] Entered vfs_setxattr    [ 76.889785] xattr_permission returned error: 0    [ 76.889788] security_inode_setxattr returned error: 0    [ 76.889788] __vfs_setxattr_noperm returned error: –    [ 76.889791] vfs_setxattr returned error: –    [ 76.889791] setxattr returned error: –    [ 76.889791] returning error: –                Hmm, we have some more information but it’s still not great. As you can see,       we’re not hitting the ERANGE conditions in setxattr, and by an excerpt of the       the next log (after adding more printks) we can see that whatever handler it’s       choosing is returning the error:             [ 154.153282] security_inode_setxattr returned error: 0    [ 154.153282] Entered __vfs_setxattr_noperm    [ 154.153284] Entered __vfs_setxattr     handler->set returned error: – 54    [ 154.153287] __vfs_setxattr returned error: – 76                Given that fuse_xattr_handlers does not return ERANGE, we can assume that it       my assumption about the handler that it returned was wrong, let’s verify this       by adding “% pf” which the elinux “Debugging by printing” guide tells will print       a symbol name for a pointer value.              Handler chosen is: ext2_xattr_user_handler [ 76.889785]         

      Ok! Now that we have this information, let’s do another deep dive into the ext2       handler.               [ 76.889763] ************************              ext2_xattr_user_handler is defined as a structure in xattr_user.c:      [ 76.889763] **************************         const struct xattr_handler ext2_xattr_user_handler={            .prefix=XATTR_USER_PREFIX,            .list=ext2_xattr_user_list,            .get=ext2_xattr_user_get,            .set=ext2_xattr_user_set,    };                ext2_xattr_user_set is a simple passthrough to ext2_xattr_set which _does_       contain the value of ERANGE. Huzzah! ext2_xattr_set is defined in       fs / ext2 / xattr.c at line (************************************************************************, the ERANGE mention comes pretty early on:               name_len=strlen (name);      (if (name_len>) || value_len>sb->s_blocksize)      return -ERANGE;                Hmmm. According to the documentation (man 2 setxattr) we should be returning       ENOSPC if we can’t store the xattr:            ENOSPC There is insufficient space remaining to store the extended          attribute.                It would also probably make more sense to return something like ENAMETOOLONG       (An error setxattr is listed as capable of returning because it shares error       codes with stat (2)). On a more general case, should we propose a patch to bring       the kernel in line with the documentation, or instead bring the docs in line       with the kernel’s actual behavior? Well, let’s base it on example. Looking at       the ext4 handlers, we can see that ext4 also returns ERANGE if the name is too       long (in ext4_xattr_set_handle):              if (strlen (name)>255    2338 return -ERANGE;                Additionally, ext4 appears to be in-line with the documentation and returns       ENOSPC when value_len is too big (the exact code path is kind of convoluted, so       I won’t represent it entirely here, and I could be mistaken about what this       section of code does):             if (! ext4_handle_has_enough_credits (handle, credits)) {   2340 error=-ENOSPC;    goto cleanup;   153284         

      Given that clearly other handlers return ERANGE, paired with the fact that we       should keep things stable for userspace, it’s probably better to submit       documentation fixing this rather than attempting to patch ERANGE out.              **********************************              So, this (unfortunately) three-day journey culminated in the following message       to the man-pages mailing list:       ******************************** [ 154.153284]         linux-man: ” setxattr.2: Add ERANGE to ‘ERRORS’ section”       ******************************             And this simple fix in libkoios:      [ 76.889763] **************************         – if (errno==ENOSPC) {     if (errno==ENOSPC || errno==ERANGE) {        [ 76.889760] **************************** (************************************** () ************ [ 76.889785] ************************************ (Read More) **************************** [ 154.153287]Brave Browser(*****************************

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

The Cathedral and the Bizarre, Hacker News

[100%OFF]Amazon AWS EC2 + LEMP + WooCommerce (Beginner-Advanced)