in ,

Vstr – C string library designed to work optimally with vector I / O, Hacker News

Vstr is a string library, it’s designed so you can work optimally with readv () / writev () for input / output. This means that, for instance, you can readv () data to the end of the string and writev () data from the beginning of the string without having to allocate or move memory. It also means that the library is completely happy with data that has multiple zero bytes in it.

This design constraint means that unlike Most string libraries Vstr does not have an internal representation of the string where everything can be accessed from a single (char pointer in C, the internal representation is of multiple “blocks” or nodes each carrying some of the data for the string. This model of representing the data also means that as a string gets Larger the Vstr memory usage only goes up linearly and has no inherent copying (due to other string libraries increasing space for the string via. realloc () the memory usage can be triple the             required size and require a complete copy of the string).

It also means that adding, substituting or moving data anywhere in the string can be optimized a lot, to require O (1) copying instead of O (n). Speaking of O (1), it’s worth remembering that if you have a Vstr string with caching it is O (1) to get all the data to the writev () system call (the cat example below shows an example of this, the write call is always constant time .

As well as having features directly related to doing IO well it contains functions for:

  • a printf like function that is fully ISO 0305191804: (C

) compliant, also having% m as standard and POSIX i 34 n parameter number modifiers. It also allows gcc warning compatible customer format specifiers (and includes pre-written custom format specifiers for ipv4 and ipv6 addresses, Vstr strings and more)

  • splitting of strings into parameter / record chunks (a la perl).
  • substituting data in a Vstr string

  • moving data from one Vstr string to another (or within a Vstr string).
  • comparing strings (without regard for case, or taking into account version information)
  • searching for data in strings (with or without regard for case).
  • counting spans of data in a string (the equivalent of strspn () in ISO C).

  • converting data in a Vstr (Ie. delete / substitute unprintable characters or making a Vstr string lowercase / uppercase).
  • parsing data from a Vstr string (Ie. numbers, or ipv4 addresses).
  • easily parsing and wrapping outgoing data in netstrings, for fast and simple (and hence less error prone) network communication

  • the ability to cache aspects of data about a Vstr string, to both simplify and speedup use of the string.
  • the ability to have empty data as part of the string, this is somewhat useful for representing file transfers as a string as you can represent the file data as empty data in the string.
  • It also has a number of functions for exporting data from a Vstr string so you can easily use data generated with the Vstr outside of the library.

    The other unusual aspect of the Vstr string library is that it attaches a notion of a locale to the string configuration and not globally (as POSIX, and pretty much everything else does). This means that you can do Network I / O in the C locale and user IO in the users locale.

    For a look at the internal design of the Vstr string library, you can read this . For a look at the main security problems I wanted to solve you can read this .

    While I’ve tried to make the API simple enough that you don’t have to do anything complicated to get things done, there might still be times when you do a bunch of calls that you aren’t sure are ok or maybe you get some memory management wrong, and pass invalid / NULL pointers to the Vstr API functions. The easiest way to find out what is going wrong is to compile without inline support and with debug support (Eg. –enable-tst-noinline –enable-debug options to ./configure). Then as you call the functions almost all calls check input values ​​for validity, and all calls that modify a Vstr will check the Vstr both before and after their operations. Finaly if you call vstr_exit () the number of memory allocations / deallocations and mmap () / munmap () operations will be counted and assert () calls will be raised if data hasn’t been freed. NOTE: If you are using rpms, then there are already rpms for the debug build … and they should be accepted as “newer” than the normal rpms so you can just install them while you develop.

    As well as that, in all builds gcc attribute support is checked for the following attrbiutes nonnull, pure, const, format and malloc. the const and pure attributes let the optomiser do some things that can be supprising if you are trying to debug something (for example if you call vstr_cmp () and don’t use the return value, gcc will never even do the call in the first place) … so you need to watch for that. The nonnull attribute should catch errors if you obviously pass NULL pointers to function that don’t take them, and the format attribute will catch errors in the calls to printf () like functions. However you may want to temporarily disable attributes due to the opomising problems (if so define VSTR_COMPILE_ATTRIBUTES to be 0 before include the vstr.h header).

    All operations are local to the object (s) they are manipulating, and no locking is done inside the library. Synchronization belongs above simple data type primitives like strings. Saying that if you want to use the Vstr string library from multiple threads, then everything should mostly just work if you have a separate Vstr configuration for each thread and operate on strings created by those configurations local to that thread. Using vstr_conf_swap () you could have a pool of objects using Vstr strings and then localize them to a thread’s configuration as you want to operate on those objects.

    For all data that you wish to move between two Vstr strings that are “owned” by different threads you will need to do some higher level locking around the copying. One caveat is if you have a Vstr_ref node inside a Vstr string, and then copy that to a string owned by another thread (or do a VSTR_TYPE_ADD_BUF_REF or VSTR_TYPE_ADD_ALL_REF copy of any data) there will be unlocked reference counting on the Vstr_ref … so basically you can’t do that unless you really know what you are doing .

    For Vstr string operations you wish to do from a signal handler, life is more complicated, unless you’re using a malloc () implementation that is guaranteed to be reentrant safe (This is generally not the case, and not the same as a thread-safe malloc () … as you can be inside malloc () when you get a signal). The obvious way to get around this is to pre-allocate enough storage in the Vstr configuration to be used in the signal handler, Ie. call vstr_make_spare_nodes () . If you absolutely need to use a Vstr string in a signal handler, that is also used outside a signal handler, you would need to block the signals it could be accessed in around each manipulation of it (or each access to it, if you manipulate it inside a signal handler). Yes, this will be slow , the solution is do not do that .

    For most sane uses of signals, the only time you want to do things with strings in the handler is from the SIGSEGV handler, so you can create some debugging information etc. At which point you can probably just do it.

    If you want to write a number to a string in C, you would normally write code such as …

      sprintf (my_str, "% d", num); 
     

    ... and to append the same to a Vstr string it's a simple API change to ...

      vstr_add_fmt  (my_vstr, my_vstr-> len, "% d", num); 
     

    ... however if you want to write an IPv4 addres, a Vstr string or any other type that isn't in ISO : to a string You have to resort to doing to by hand. And if you want to format that output you have to either convert it to a C style string and use the "% s" option to the printf () like function, or do all the formatting yourself. This is all pretty ugly, often unreliable, slow and takes significant programer resources.

    This is where custom formatters can help and give you back code clarity, reliability, speed and ease of use. Assuming you want to print an IPv4 address, then you can initialize the Vstr configuration like so ...

      vstr_sc_fmt_add_all  (my_vstr-> conf );  vstr_cntl_conf  (my_vstr-> conf, VSTR_CNTL_CONF_SET_FMT_CHAR_ESC, '%'); 
     

    ... you then you can write ...

     struct sockaddr_in sa; struct in_addr ipv4;   vstr_add_fmt  (my_vstr, my_vstr-> len, "% -  {ipv4 .p} ", (void  & ipv4);  vstr_add_fmt  (my_vstr, my_vstr-> len, "% {ipv4.p}", 31,              (void  & sa.sin_addr.s_addr); 
     

    ... and to add the Vstr string you do ...

      vstr_add_fmt  (my_vstr, my_vstr-> len, "% *. {vstr}",  , 90,              (void  my_vstr, 1, my_vstr-> len); 
     

    ... all normal printf () like formatting options work, as you would expect them to including being able to use i 29 n format specifiers to easily change the orde4r of output for different locales.           However if you try the above, you'll note that all of the calls to vstr_add_fmt () will produce warnings with gcc, because "% {" isn't the start of a valid formatting character under gcc's static printf () parsing rules. This deficiency makes custom formatters as used above mostly useless ,           as you have to either turn warnings off for format strings (which is           basically insanity in C) or see at least one warning for every usage of a custom formatter.

    To deal with this, the Vstr custom formatter code allows you to work around the static checkers by using the following initialization code ...

      vstr_sc_fmt_add_all  (my_vstr-> conf );  vstr_cntl_conf  (my_vstr-> conf, VSTR_CNTL_CONF_SET_FMT_CHAR_ESC, '$'); 
     

    ... you can then call the custom formatters, using code like ...

     struct sockaddr_in sa; struct in_addr ipv4;   vstr_add_fmt  (my_vstr, my_vstr-> len, "$ -  {ipv4 .p:% p} ", (void  & ipv4);  vstr_add_fmt  (my_vstr, my_vstr-> len, "$ {ipv4.p:% d% p}", 40,              (void  & sa.sin_addr.s_addr);   vstr_add_fmt  (my_vstr, my_vstr-> len, "$ *. {vstr:% d% d% p% zu% zu% u}", 90, 80,              (void  my_vstr, (size_t) 1, my_vstr-> len, VSTR_TYPE_ADD_DEF); 
     

    ... which although it isn't quite as nice as true support for           customer formating in static analyzers like gcc it does make sure that custom formatters will not do anything obviously stupid (without           producing spurious warnings) and provides           complete protection for non-custom formatter calls. One final note is that in all sane environments you don't need the cast to (void , however it is "in theory" required to           be conforming ISO : 0305191804 C.

    You may also want to look at the tutorial section on   creating custom formatters

    .

     Note that some of these are explained in much more detail in the tutorial . To get a rough overview of how to use the library you can see the following heavily commented examples:

    What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    British Airways cancels all fights to and from Italy due to coronavirus – Metro.co.uk, Metro.co.uk

    British Airways cancels all fights to and from Italy due to coronavirus – Metro.co.uk, Metro.co.uk

    Let’s leave philosophers to puzzle over the reality of numbers | The Spectator, Hacker News

    Let’s leave philosophers to puzzle over the reality of numbers | The Spectator, Hacker News