in ,

The sad state of personal data and infrastructure, Hacker News


Why can’t we have nice digital things?          


TLDR: in this post I’m going to be exploringmissed opportunitiesat engaging and interacting with your personal data and digital trace, and speculating onwhy is it that wayandhow to make it easier.

It might seem like a long rant, but I promise you I am not a kind of person who whines and vents just for the sake of it!

In this particular post I’m just building up motivation and engaging you, and I

do havesome workarounds and suggestions . This article got long enough, so I will explain them in detail in the second part.

I also did not want to mix discussions on motivation (this one) and my take on implementation (which will follow).

************************ (1) **************** Intro: your data is trapped

Note: for clarity, I will use‘service’ to refer to anything holding your dataand manipulating it, whether it’s a website, phone app or a device (ie not necessarily something having online presence).

On one hand, in things are pretty great. For almost anything you wish to do on your computer or phone, you’d find several apps, platforms and ecosystems that would handle your task in one way or another.

On the other hand, typically, once the service has your data, it’ssiloed and trapped. You are completely at the mercy of service’s developers and management.

Within the same ecosystem (e.g. Google / Apple / Microsoft) you might get some integrations and interactions if the company spares them. Apart from these,integrations are virtually non-existent.

We have so much data, yet it just sits there doing nothing.

Now and then some startuppops up that connects together couple of APIs for a fee. I don’t want to pick on startups but typically it’s something trivial like displaying calories consumed from your food tracker app on the same plot with calories burnt from your fitness tracker. Trivial is okay, and I do acknowledge it’s way harder to implement than it looks (I evenexplore why later. The actually sad thing is that as a user, you’re lucky if you use the right kind of fitness tracker that the service supports, or if you agree with their analysis methodology. Otherwise, sorry!

There are also services likeIFTTTwhich offer pretty primitive integrations and Also require cooperation from all parties:

My Heroic and Lazy Stand Against IFTTT (Pinboard)**********************

  • Google removing Gmail access from IFTTT
  • Often UIs have some inconveniences (or just plain suck), which are often fine for an average user (aka KPIs), but leaves number of users who are dissatisfied (and often these are power users).

    In essence, services fully control the way they present you information.

    Sure, it’s a free market, just switch to other / better service, right? Switching to new and unfamiliar tools is hard enough cognitively as it is, but what’s worst is that in most cases you have toleave behindall your old data . You’re lucky if you can do some sort of data import / export and if it works properly.

  • Personal data is in a sad state these days. Let me elaborate.

    ¶2Why does that bother me?

    To be fair, I don’t understand how does thatnotbother you!

    To start with, allow me to quote


    I consume lots of digital content (books, articles, Reddit, Youtube, Twitter, etc.) and most of it I find somewhat useful and insightful. I want to use that knowledge later, act and build on it. But there’s an obstacle: human brain.

    It would be cool to be capable of always remembering and instantly recalling information you’ve interacted with, metadata and your thoughts on it. Until we get augmented though, there are two options: first is just to suck it up and live with it. You might have guessed this is not the option I’m comfortable to take.

    Second option is compensating for your sloppy meaty memory and having information you’ve read at hand and a quick way of searching over it.

    That sounds simple enough but as with many simple things, on practice you run into obstacles. I’ll give some I’ve personally been overcoming as examples:

    convenience of access , eg:

    to access highlights and notes on my Kobo ebook I need to actuallyreach my reader and tap through e-inktouch screen. Not much fun!

  • There is no easy way to quickly access all of your twitter favorites, people suggest using hacks like (autoscroll extension
  • .

  • searching data, e.g .:
    search function often just isn’t available at all, eg on Instapaper, you can’t restrict search to highlights. If it is available, it’salmost never incremental.

  • builtin browser search (
    ************** (Ctrl-F) ) sucks for the most part: it’s not very easy to navigate as you don’t get previews and you have to look through every match

  • sometimes you vaguely recall reading about something or seeing a link, butdon’t remember where exactly. Was it on stackoverflow? Or in some github issue? Or in a conversation with friend?
  • data ownership and liberation, e.g.

    What happens if data disappears or service is down (temporary / permanently) or banned by your government?

    You may think you live in a civilized country and that would never affect you. Well,

    in 2018

  • , Instapaper was unavailable in Europe for several months (!) Due to missing the GDPR deadline.

  • (************************************************************************************************************************************************************************************************************************************************************************************************************************% of servicesdon’t have support for offline mode. This may be just a small inconvenience if you’re on a train or something, but there is more to it. What if some sort of apocalypse happens and you lose all access to data? That depends on your paranoia level of course, and apocalypse is bad enough as it is, but my take on it is that at least I’d have my data 🙂

  • if you delete a book on Kobo, not only you can’t access its annotations anymore, but they seem to getwipedfrom the database.**********************************

    So as you can see, my main frustrations are around the lack of the very basic things that computers can doextremely well: data retrieval and search.

    I’ll carry on, just listing some examples. Let’s see if any of them resonate with you:

    (¶) search and information access

    (********************************** Why can ‘t I search over all of my personal chat history with a friend, whether it’s ICQ logs from or Whatsapp logs from

    ****************************************************************************************************************************************************************************************************************************************************************************************************? **********************************************

    Why can’t I have incremental search over my tweets? Or browser bookmarks? Or over everything I’ve ever typed / read on the Internet?

    Why can’t I search across watched youtube videos even though most of them have subtitles hence allow for full text search?


  • Why can’t my Google Home add shopping list items to Google Keep? Let alone other todo-list apps.

    Instead, it puts them in a completely separate product, Shopping list. If any of these had an API,any programmercould write a script to synchronize them in a few hours.

    (Why can’t I create a task in my todo list or calendar from a conversation on Facebook Messenger / Whatsapp / / Telegram?

    Often, a friend recommends you a book so you want it to add to your reading list. Or they ask you for something and you want to schedule yourself a reminder.

    Instead, these appsactively preventme from using builtin Android share functions (because it means leaving the app presumably).

    (********************************************** journaling and history (******************************************** (**************************************************************** (Why do I have to lose all my browser history if I decide to switch browsers?

    Even when you switch between major ones like Chrome / Firefox. Let alone for less common alternatives.

    Why can’t I see all the places I traveled to on a single map and photos alongside?

    I have location tracking and my photos have GPS and timestamps.

    Why can’t I see what was my heart rate (ie excitement) and speed side by side with the video I recorded on GoPro while skiing ?

    I’ve used HR tracking and location tracking, surely that’s possible?

    (********************************************************************** (Why can’t I easily transfer all my books and metadata if I decide to switch from Kindle to PocketBook or vice versa? (**********************************************************************

  • ************************************************** (¶) ********** consuming digital content
    Why can’t I see stuff I highlighted on Instapaper as an overlay on top of web page?

    Hypothes.isdoes it, so it’s totally possible, right?

    Why can’t I ‘read it later’ list unifying things saved on Reddit / Hackernews / Pocket?

  • Why can’t I use my todo app instead of ‘Watch later’ playlist in youtube?

    ‘Watch later’ is fine for short videos that I can watch over dinner or on my commute. Longer videos like talks and lectures need proper time commitment hence prioritizing.

    (******************************************************************************** (Why can’t I ‘follow’ some user on Hackernews?

    It’s just a matter of regularly fetching new stories / comments by a person and showing new items, right?

    Why can’t I see if I’ve already ran across a Youtube video because my friend sent me a link months ago?

    The links are there in chat history, surely it’s a trivial task to find it?

    (************************************************************************************** Why can’t I have uniform music listening stats based on my

    (**************************************************************************************** Why am I forced to use Spotify’s music recommendation algorithm and don’t have an option to try something else?

    (****************************************************************************************** Why can’t I easily see what were the books / music / art recommended by my friends or some specific Twitter / Reddit / Hackernews users?


    **************************************************************************************** health and body maintenance (****************************************************************************************** Why can ‘t I tell if I was more sedentary than usual during the past week and whether I need to compensate by doing a bit more exercise?

    I have all my location (hence step data), so what’s the issue?

    (************************************************************************************************ Why can’t I see what’s the impact of aerobic exercise on my resting HR?

    I use HR tracker and sleep tracker, so all the necessary data is there.

    Why can’t I have a dashboard for all of my health: food, exercise and sleep and see what are the baselines and trends?

    Why do I need to rely on some startup to implement this and trust them with my data?

    Why can’t I see the impact of temperature or CO2 concentration in room on my sleep?

    My sensors got Bluetooth and Android apps, why can’t they interact with my sleep data?

    Why can’t I see how holidays (as in, not going to work) impact my stress levels?

    It’s trivial to infer days of work by using my location data.

    Why can’t I take my Headspace app data and see how / if meditation impacts my sleep?

    Why can’t I run a short snippet of code and check some random health advice on the Internet againstmy (health data.)

    (************************************************************************************************************** (¶personal finance**********************************

    Why am I forced to manually copy transactions from different banking apps into a spreadsheet?

    Why can’t I easily match my Amazon / Ebay orders with my bank transactions?

    (¶) why I can’t do anything when I’m offline or have a wonky connection?

    Aka # offline

    . On one hand it’s less and less of an issue as Internet gets more reliable. On the other if you start relying on it too much, that’s becoming more and more of a single point of failure. (************************************************************************************************************************ (¶) tools for thinking and learning (************************************************************************************************************************** Why when something like

    ‘mind palace’isliterally possiblewith VR technology, we Don’t see it in any use?

    Why can’t I easily convert select Instapaper highlights or new foreign words I encountered on my Kindle into Anki flashcards?

    (******************************************************************************************************************************** (¶mediocre interfaces) **************************************

    Why do I have to suffer from poor management and designers’ decisions in changing UI , even if interface is not the main reason I’m using the product?Why can’t I leave recognition and notes on my saved Reddit / Hackernews items?

    I’ve got too many saved things to read them linearly and I’ll probably never read them all. I’ve also got other things to read and do in general, why can’t I have a unified queue for consuming content?

    Why can’t I leave private notes on Deliveroo restaurants / dishes, so I’d remember what to order / not to order next time? (**************************************************************************************************************************************** Why do people have to suffer from Google Inbox shutdown ?

    Not to undervalue Inbox developers, but fundamentally it’sjusta different interface. I’m sure there are plenty of engineers who would happily support it in their spare time if only they had access to the APIs.


    communicationcommunication and collaboration (******************************************** (********************************************************************************************************************************************** Why can’t I easily share my web or book highlights with a friend? Or just make highlights in select books public?

    Why can ‘ t I easily find out other person’s expertise without interrogating them, just by looking what things they read instead?

    (********************************************************************************************************************************************** () backups

    Why do I have to think about it and actively invest time and effort?

    What about regular people whohave no ideahow unreliable computers can be and might find that the hard way?


    I think all of this is pretty sad. Note that I haven’t mentioned any mad science fiction stuff like tapping directly into the brain (as much as I wish it was possible). All these things are totally doable with thetechnology we already possess.

    I wonder what computing pioneers like Douglas Engelbart

    (eg see

    Augmenting Human Intellect)Alan Kaythought / think about it and if they’d share my disappointment. So many years have passed since the computing (and personal computers) spread, and we’re still not quite there. And companies areactivelypromoting these silos.

    Imagine if all of this was on your fingertips? If you did not have to think abouthow and whereto find information and could just access it and interact with it? If you could let computers handle the boring bits of your life and spend time on fun and creative things?

    ************** (3) ****************** (Your data is vanishing)

    Things I listed above are frustrating enough as they are. There is another aspect to this: your data isslipping away.

    Privacy concerns are important and it’s understandable when people are pissed about services keeping hold of their data instead of properly wiping it.

    However, often opposite is the case and you find that your data is gone or very hard to access:

    Google Takeout data, that is, all your browser activity, Youtube watch history, etc., are only kept by Google for (few years)

    If you were only exporting it now and then and haven’t kept old archive, chances are you’ve lost some of your history.

  • Chrome browser deletes history older than
    ************************************************************************************************************************************************************************************************************************************************************************************************************************ (days)

  • Firefox browser expires history based on some
    ************************************************************************************************************************************** magic algorithm

  • ******************************************************************************************************************************************************** Reddit API limits your requests to (results only)
  • (Twitter APIwould only give you**************************************************************************************************************************************************************************************************************************************************************************************************** the latest tweets

    You can get rest of your tweets via manual export, but then you’re gonna have to integrate two different ways of accessing data.

  • Monzo APIonly allows to fetch all of your transactions within 5 minutes of authentication.

    I understand that it’s a security measure, but my frustration still stands.

  • The problems above are sort of technical and in theory, can be solved by some engineering. There is another side to vanishing data:

  • information is generally (rotting awayfrom the Internet

  • comments / posts / tweets you’ve interacted with get deleted by their authors

    While people have right to delete their data from the Internet, arguably it doesn’t extend to derived content like comments or thoughts that you had on it.

  • And a bit more:

    Jawbone UPhas gone bust

    In July 2018 Jawbone announced [73] it would liquidate its assets. Since the app is still available for at least some phones (Android) and the servers seem to be running,it is unclear who has access to collected personal data.

    sweet. In addition, the APIdoes not work anymore either, so if you haven ‘t been exporting data, it’s basically gone.

  • ************************************************************************************************************************************************************ My GitHub account has been restricted due to US sanctions as I live in Crimea ‘
  • Verizon / Yahoo Blocking Attempts to Archive Yahoo Groups
  • This one is particularly bad.

    If you consider your digital tracepart of yourself, this is completely unacceptable. But sadly it’s happening all the time. Youcan’t rely on third partiesto keep it safe.


    (4) What do I want?

    I want all these inconveniences somehow solved, but I live in real world and that’s not gonna magically happen. So let me be more specific: I argue that onemajorreason these tools and integration I want don’t exist is that people don’t have easy uniform access to their data in the first place.

    “Easy” is used here in two senses:

    easy for humansto look at and browse through

    This bit is hard in practice as (typically) the more something machine friendly the less it’s human friendly.

  • easy for programmersto manipulate, analyze and interact with

    Let’s concentrate on this part for now. If this is solved, it automatically enables programmers to develop human-friendly tools.

  • So how would ‘easy access to data’ look in ideal world? Let me present you my speculations on it, and I would be happy to hear your opinions on it!

    I wantan APIthat I can query and get any of my personal data. In ideal world it wouldn’t really matter where the data is and it could be a web API.

    However, realistically, as of today, easiest way to quickly access your data and more importantly, play with it, is when it’salready on your filesystem.

    Whether it’s plaintext, sqlite or some sort of binary data – does not matter, already having it locally saves you from the whole class of problems (which I’m about to pinpoint in the following section).

    As you probably noticed, it’s almost never the case that you have your personal data locally at hand. You need to spend extra effort to achieve this.

    (**********************************************************************************************************************************************************************************5So what’s the problem?

    Okay, so hopefully we can agree that the current situation isn’t so great. But I am a software engineer. And chances that if you’re reading it, you’re very likely a programmer as well. Surely we can deal with that and implement, right?

    Kind of, but it’sreally hardto retrieve data created by you.

    Recommended soundtrack for rest of the section: The World’s Smallest Violin, playing for us software engineers.

    At the first glance it doesn’t look like a big deal. It’s just data, right? Every programmer should be capable of getting it from the API, right?

    This is until you realize you’re probably using at least ten different services, and they all have different purposes, kinds of data, endpoints and restrictions.

    Even if you have the capacity and willing to do it, it’s still damn hard:

    You’re gonna have to deal with the following problems:



    That’s where it all starts with and it’s a mess.

    easiest scenario: service lets you generate an API token from its settings and you can just use it after that. Example: (pinboard)

  • typical scenario: you need to do the whole (Oauth) thing.

    That involves creating a client app, getting client id, dealing with scopes and redirect urls, etc. Pretty tedious, and you certainly don’t expect anonprogrammerto be able to follow these steps.

    Examples: almost every service with an API out there: Twitter / Instapaper / Pocket / Github / etc.

  • worst scenario: the service does not even offer a public API. That also has different grades of horrible:

    best worst: service uses private API and you can spy on the token web app is using in browser dev tools.

    Not too bad, but bit dubious.

    Example: Pocket API doesn’t give you away highlights unless youmess with it.

  • typical worst: no private API, so you need to scrape the data. Sometimes you can grab the cookies from browser dev tools and use them to access your data.

    Scraping is orders of magnitude flakier, involves nasty parsing and obviously fragile. Also some services might even actively prevent you from doing so by banning unusual user agents.

    Examples: Facebook Messenger

  • worst worst: you need to scrape the data and cookies don’t work or expire often.

    Basically that means you need to use your username / password. Bonus points if there is 2-factor auth involved.

    Potentially that means you’re going to store your password somewhere which is way less secure than using a token.

    Example: Google Takeout

  • exports are not only asynchronous, but also don’t have an API so you have to login in order to export.

    All the ‘worst’ scenarios are extremely flaky and basically impossible for nonprogrammers to use.


    ******************************************************************************************************************************************************************************************** pagination

    Whether you’re using API or not, typically you’ll have to retrieve multiple chunks of data and merge them after.

    It’s not hard to implement it in principle, on one off basis, but unclear how to do it in some universal way because there is no common standard.

    Pages might be addressed by page numbers and counts, offsets from start / end of data, before or after with respect to ids or timestamps, etc.

    It’s quite error prone: content might change under your feet, and if the API developers or you are not careful, you might end up with missing data or even some logical corruption.

    (************************************************************************************************************************************************************************************************ () consistency

    If you simply start fetching a json and writing to disk, you’d very quickly end up with a corrupt file on the first network failure. You’ve gotta be really careful and ensure atomic writing and updating.

    Even if you work around atomicity issues, chances are you won’t be able to guarantee atomic snapshotting as you’re fetching your data within multiple requests, and data is changing as you retrieve it.

    ¶ rate limiting

    No one likes their API hammered, fair enough. However, rate limits often vary from API endpoint to endpoint and inherently tedious to get right.

    If you’re not using the API, you might get banned by DDOS prevention (e.g. Cloudflare) if you’re not careful.

    Overall, painful and not fun to implement.

    (errorerror handling**************************

    Authorization, network, serializing, parsing, storing, synchronizing. There are among the most common error sources (as in, actual unrecoverable errors, not necessarily bugs) in software engineering generally, and getting it right is required for reliably retrieving your data.

    In addition, you want to be somewhatsemi-defensive, and this is the hardest kind of error handling:

    you want to progress slowly but surely

  • You want to make sure it only fails in completely unrecoverable scenarios, otherwise it’s going to require constant tending
  • and you want to somehow let user know of problems / suspicious data
  • (****************************************************************************************************************************************************************************************************************** documentation and discovery

    If you wantallyour data, you have to look carefully despite the whole documentation and make sure you’ve got it all covered.

    If the service adds some new endpoints, you might never find that out.


    For the most part not an issue, but some websites do not offer and API so you’ve got not choice but scraping and parsing HTML.

    Notorious example: some Hackernews (!) Endpoints like ‘favorites’ are not exposed via API.

    (************************************************************************************************************************************************************************************************************************ (¶) abstract representation

    Having raw export data (e.g. sqlite database / json file / etc) is nice, but to actually use it you need an abstract representation. You basically have to reinvent whatever the service developer does on backend already.

    Notable examples:

    unclear which data types to choose: nullable / non-nullable, string or integer for ids, float or integer for amounts

  • timestamps: figuring out whether it was seconds or milliseconds, UTC or local timezone; and zillions of string formats which you need to parse (I had to do it so often that I even memorized the weird argument order indatetime.strptime
  • **********************

  • which situations are valid, e.g. can id be used as a dictionary key, can you assume that they are increasing, etc.(******************************************************************************************************************************************************************************************************************************

    no access to data (********************************************************************************************************************************************************************************************************************************

    Sometimes you have no way to access your data at all:

    you are offline nuff said

  • app data on your phone

    Very few apps support data exports; even less support it in automatic and regular way. Normally, internally, apps keep their data in sqlite databases which is even more convenient than plaintext / csv export.

    However, there are caveats: e.g. on Android, app data is in / data / data / directory, which by defaultis not accessible unless you rooted the phone.

  • devices that have no means of synchronizing
  • Kobodoesn’t seem to support cloud sync for annotations. I was considering syncing the database wirelessly, as there are

  • some SSH modules for its firmware, but people report it (may break wifi) on it. Some devices / apps and formats are******************************************************************************************************************************************************************************************************************** vendor locked(************************************************************************************************************************************************************************************************************************************** (¶****************************

    Now, remember when I said it was tedious for programmers? Constant boilerplate, shitty APIs (you’re lucky if the service offers it at all), latency, flakiness, having to code defensively, etc.

    Now think about ordinary people who have no clue what ‘API’ is. They deserve to use their data too.

    6) How to make it easier: data mirror

    The way I see it, ideally the service you’re using provides you with:

    adata mirrorapp

    Best case scenario is if the service is local- first

    in the first place. However, this may be a long way ahead and there are certain technical difficulties associated with such designs.

    I’m suggesting a data mirror app, that merely runs in background on client side and continuously / regularly sucks in and synchronizes backend data to the latest state.

    Ideally this would be exactly the same state that backend uses, although in practice it would be hard at the very least for efficiency reasons (e.g. it’s faster for the backend to keep data in the same database instead of separate databases for each user).

    It shouldn’t be too resource demanding for the backend, e.g. data sync via push notifications basically already does that, but even in a less efficient way.

    Data mirror should dump data in open machine-friendly format like json / sqlite database / etc.

    This solves:

    authorization: however tedious it’s to implement, can be handled by the service’s developers.

    They can make it as secure as necessary (e.g. 2FA / etc), as long as you log onto it once and it keeps the token.

  • pagination / consistency / rate limiting: non-problems essentially, especially considering that it’s easier for service’s developers to implement incremental data fetching
  • error handling: also developer’s responsibility, and they would know better which situations are programming bugs and which have to be handled carefully
  • documentation and discovery: hopefully developers are better suited to keep their internal representations and exports consistent (even incentivised as it allows to write less code)
  • backups: will still have to be done by external means, but it simplifies the task massively, you just need to point you backup tool at your data storage
  • minimalisticdata bindingsin some reasonable programming language that represent all of this data.

    Hopefully, specific language doesn’t matter, it’s a trivial, almost automatic, task to map data from one language to another.

    This solves:

  • abstract representation: would massively lower the barrier for integrating and interacting with data
  • offline: if you have all data locally you’ve got efficient access without latency and need for extra error handling
  • That’s perhaps a naive and oversimplified view. But to be honest we’re so far away from that, that even some small steps towards would be quite a progress.

    These suggestions woulddecouple data from UIand let the community develop better tools for consuming and working with it.


    potentialpotential caveats (************************************************************************************************************************************************************************************************************************************************

    this might behard to supportfor everyone

    On the other hand, service developers would have more control on data access patterns, so in a way it might work better.

    It would definitely be more efficient than third parties writing kludgy tools to export and backup data.

    In addition, for some services and scenarios, it would give better data locality and lower latencies.

  • ‘average’ users often arenot motivated enoughto demand such things

    In particular, not everyone has or willing to set up necessary infrastructure to run all these things.

    However, if implemented properly, there is absolutely nothing preventing running data mirror on your laptop or even phone. It really doesn’t require much CPU or bandwidth if you support incremental updates.

  • services have little motivation to promote this,silos benefit them

    Having monopoly on the client interface (e.g. web UI) keeps users on your platform even if you suck.

    If anyone can implement a better interface, there would be little opportunity for stuff like ads, and only way for the service to make money would be to take a fee for data collection and serving. (which I personally will be happy to pay)

  • Hopefully all of these issues would be solved by distributed / federated services, but we’re pretty far from it.

    (**************************************************************************************************************************************************************************************************************************************************** (¶) unsolved problems

    deleted content

    E.g. Imagine you liked someone’s post on Facebook, that fact data got mirrored locally, and then the author removed the post.

    What’s the right thing to do for the data mirror app? Should it erase just the post you liked from your data mirror? Should it keep the fact that you liked it at all?

    You may disagree with the way such a policy imposed by the service, hence implement additional logic to keep more data, and at that point it seems like a matter for legal debate.

  • synchronizing data

    If you want to access data from multiple devices, you either have to run multiple mirrors, which would be a bit of a hassle, or use some continuous sync service like Dropbox or Syncthing

  • .

    That however might not be so atomic, depending on the way data is kept on the disk, since files might be pulled in random or lexicographic order, depending on sync configuration.

  • protecting the data

    Even if you don’t trust your average startup at securing your data, it might be even less safe on average user’s disk.

  • it’s assuming that tools / integrations are open source and running on computers you own.

    Realistically, closed source tools do exist and it’s understandable when people want money for their efforts.

    From user’s perspective not everyone wants the hassle of running things locally either and many people are happy with online services for the most part.

    (7) **************** What do I do?

    Of course, I’m not expecting someone going to come and implement all of this for me. I could start some sort of movement to demand it from services and platforms, but I hardly see it working.

    So I’ve spent effort into exporting, integrating and utilizing my data on my own according to the suggestions I formulated. Putting this in writing helped me motivate and summarize many technical and infrastructural decisions.

    I’ll be describing my setup in more details in future posts, however here are some bits and pieces:

    ¶(regular data exports)

    This corresponds to the ‘data mirror’ bit.

    I exported / scraped / reverse engineered pretty much all of my digital trace and figured out automation and infrastructure that works for me.

    I’ve sharedsomeof my personal export scripts and tools.

    I also have some helper scripts to keep individual exporter’s code as clean as possible while ensuring exports are reliable.

    As I mentioned, I’ll share all of this later in a separate post.

    (¶python package to access data

    Each data exporter comes with minimal bindings that merely map json / sqlite export into simple datatypes and data classes.

    That way anyone who wishes to use data can kick off some reasonable representation, which is not overfitted to my specific needs.

    Higher level querying and access, specific to myself is implemented inmy.package (note that this post is still in draft stage) .

    (********************************************************************************************************************************************************************************************************************************************************** ¶how do I use the data? (************************************************************************************************************************************************************************************************************************************************************************

    Finally, some tools and scripts I’ve implemented to bring closer interactionsthat I want:

    (personal search enginefor quick incremental search in my data and digital trace

  • ********************************************************************************************************************************************************************************************************************************************************************** orger
  • : tool to convert data into org-mode views for quick search and overview (alsofor prioritizing content consumption (eg processing Reddit saves)

  • ************************************************************************************************************************************************************************************************************************************************************************ (alsofor populating my spaced repetition queue
  • ************************************************************************************************************************************************************************************************************************************************************************ (also) for creating TODOs straight from Telegram messages
  • ******************************************************************************************************************************************************************************************************************************************************************** (grasp) , browser extension to clip links straight into my org-mode notes
  • **************************************************************************************************************************************************************************************************************************************************************** promnesia
  • , browser extension to escape silos, unify annotations and browsing history from different data sources (still somewhat WIP and needs final touches, but planning to release soon)

  • personal sleep and exercise dashboard, taking into account all possible data sources. I’m in progress of making it public, you can see some screenshotshere(**********************************************************************************************************************************************************************************************************************************************************************************

    data availability

    I’m synchronizing ev erything across my computers withsyncthing.

    (************************************************************************************************************************************************************************************************************************************************************************************** (¶

  • backups

    I’m simply using borg backupagainst exported data, whether it’s kept in json files or sqlite databases.


    ¶ 9

    I’d be interested to know your opinion or questions, whether on my motivation, or particularities of my suggestions or implementation.

    Let me know if you can think of any other data integrations you are missing and perhaps we can think of something together!



         ( **********             

  • ******************************************************************************************************************************************************************************************************************************************************************************************** / r / QuantifiedSelf
  •           () ************************************************************************************************************************************************************************************************************************************************************************************************ (************************************************************************************************************************************************************************************************************************************************************************************************

  • **************************************************************************************************************************************************************************************************************************************************************************
  • ****************************************************************************************************************************************************************************************************************************************************************************************** Read More********************************************************************************************************************************************************************************************************************************************************************************************