in ,

Thinking Psycopg3, Hacker News

  

    

I have been psycopg2 maintainer since 2020; before that, git says that my first contribution was in . At that time, psycopg (one) was the tool to use, and psycopg2 was an interesting experiment.

In several years of using psycopg2 daily and reading about expectations, frustrations, surprises experienced by other users, I have been making my mind about a better system, and if I had to release a version incompatible with the past, those are some of the things I would change.

Query parameters adaptation

psycopg2 composes queries (replacing the placeholders in the SQL strings with the values ​​supplied by the user) on the client, and sends a complete query to the server.

Sequence diagram for psycopg2 right, so it doesn’t result in SQL injection vulnerabilities. However the correct way of doing it, as far as PostgreSQL is concerned, should be to send the query and its parameters separately (i.e. using the PQexecParams libpq function rather than PQexec ).

Separating the statement from the parameters improves performance and memory usage at parsing time. However the behavior of the library would change slightly, which is the reason why server-side merging hasn’t been used so far. For instance:

PQexecParams only supports one command at time: it wouldn't be possible anymore to send a batch of queries separated by semicolon in a single execute () call. Sequence diagram for psycopg3 psycopg2 helps solving some type casting ambiguities by attaching casts to the types represented as strings (eg the date 1/1 / is currently merged to the query as - - 06 ':: date ; using PQexecParams the trick wouldn't be available anymore. It might be possible maybe to suggest the type in other ways, but if that's not the case then queries should be modified by applying explicit casts next to their placeholders (e.g. % s :: date

    Custom-defined adapters should be rewritten.

    The main difference between the old adaptation protocol (called

    ISQLQuote

) and the new one (which doesn't have a name yet, it could be probably called ISQL is the use of the quotes, whereby the string O'Connor is passed to the query as 'O''Connor' , with wrapping quotes and doubled-up quotes in the content. In ISQL the string wouldn't undergo the same transformation to add and escape quotes; other types would only have to be converted into strings in the PostgreSQL syntax, but wouldn't need to be wrapped in quotes to create a SQL literal. Using ISQL as the fundamental adaptation steps there would be some interesting improvements:

Context managers and transactions

psycopg2 follows a de-facto standard (which was

 widely discussed  but which never landed in the  DBAPI specs ), whereby connections used as context managers wrap a transaction, purchase it if exiting on success or rolling it back in case of error. So the way to use it is something like:   
  • conn
  • =
  • (connect () (DSN) )
    try :      # these are two separate transactions     
  • with
  • (conn :          do_something ( (conn )     
  • with
  • (conn :          do_something_else ( (conn )

    This idea has its usefulness, but it is a (very surprising behavior: Developers usually expect the same resources released by close () to be released on context exit (as files, sockets, and even DBAPI cursors do). It also gets in the way of managing different life cycles on the connection, for instance if there is the connection is taken from a pool:

  • with
  • (pool)
    and (in a stand-alone module ). This seems to ask for a different context than the connection-scope one. So maybe it would be useful to leave the latter to the management of the resource, releasing them on connection context exit, and to add an explicit method to start an atomic block (either a transaction or a savepoint, according to the current connection state):
  • with
  • (connect ()
      

    Async from the ground up

    The DBAPI interface is synchronous and blocking by design: connect () blocks until a connection is open, execute () blocks until the query is completed etc. This didn't stop psycopg2 to work asynchronously, exposing two different interfaces for it:

    a purely asynchronous one Where the application must explicitly poll () to bring forward the connection / execution process: it breaks the DBAPI interface but allows frameworks who can't make blocking calls anyway (such as the revered Twisted ) to use it;
        a coroutine-based one

    where the DBAPI blocking interface is respected but behind the scene psycopg2 collaborates with coroutine libraries such as Eventlet and gevent .

      Needless to say, the world has gone forward, and Python now offers core support in its syntax and library to asyncio : this is the foundation that newer Python async frameworks, such as aiohttp , are built upon. A third-part wrapper, aiopg , currently helps to bridge the async mode of psycopg2 with the asyncio machinery, but of course there is some overhead, mostly caused by the fact that the C implementation of psycopg2 is a relatively thick layer, with the libpq async calls deeply buried under the C implementation of the Python connection and cursor objects and behind their interfaces.

      There is a chance now to rethink how thick the C libpq wrapper should be. We can reduce the C implementation to a minimal wrapper around the libpq (replaceable by a CFFI Python wrapper if compiling C is not available on the client), using it as a foundation to build a familiar DBAPI blocking interface. A blocking behavior is not bad in itself: it allows to write most of the programs, the ones which don’t need crazy concurrency, in a simple and familiar paradigm; the async layer would be available under the hood to squeeze the best performance in programs who have embraced an asynchronous pattern and framework. Time for a picture:

      Proposed architecture for psycopg3

      Other improvements

      I’ve written down some of the ideas to implement on a Trello board and in an issues milestone . As the board shows, some of the improvements weren’t actually incompatible with psycopg2 and over the years they have already landed in in its releases; some other ones will have to wait.

      Some of what could be improved over psycopg2:

        (async COPY methods ( see (issue #)

      );

          better support for SQL_ASCII databases (which, unlike what their name suggests, are actually unencoded 8 bits, see (issue #) ();

                preparing cursors ;

                  return value on execute () :

                • for
                • (record) (in (cur (execute) ( query ):      # ...

                    connection and cursor-scoped adapters (currently you can scope the adaptation rules from Postgres to Python but the rules from Python to Postgres can only be global);

                    reorganising the (extensions) and extras module, which at the moment contain an heterogeneous assortment of objects, functions, constants …

                      I would be interested in starting the development of this project soon. If you have opinions, or interest in the project, please let us know on the Mailing List (you can subscribe here ).

                      … and if psycopg2 has been useful so far for you and for your business and if you would love to see a psycopg3 even more performing and easier to use, please consider sponsoring the development , thank you!

                     
         
          
              (Read More)    

        What do you think?

        Leave a Reply

        Your email address will not be published. Required fields are marked *

        GIPHY App Key not set. Please check settings

        Launch HN: Terusama (YC W20) – We help warehouses schedule trucks, Hacker News

        Comcast accidentally published 200,000 “unlisted” phone numbers, Ars Technica

        Comcast accidentally published 200,000 “unlisted” phone numbers, Ars Technica