Anticorruption Principles For Public Sector Information Systems
As a public official, I’ve put a lot of thought on how to make the current and upcoming public government information systems resistant to corruption. And I can list several main principles, some of them very technical, which, if followed, would guarantee that the information systems themselves achieve two properties:
- they prevent paper-based corruption
- they do not generate additional risk for corruption
So here are the principles that each information system should follow:
- Auditability – the software must allow for proper external audits. This means having the up-to-date source code available, especially for custom-built software. If it’s proprietary, it means “code available” contract clauses. This also means availability of documentation – what components it has, what integrations exist, what network and firewall rules are needed. If you can’t audit a system, it surely generates corruption
- Traceability – every meaningful action, performed by users of the system, should be logged. This means a full audit log not just for the application, but also for the underlying database as well as servers. If “delete entry” is logged at the application, but DELETE FROM is not logged by the database, we are simply shifting the corruption motives to more technically skilled people. I’ve seen examples of turned-off DB audit logs, and systems that (deliberately?) miss to log some important user actions. Corruption is thus built in the system or the configuration of its parts.
- Tamper-evidence – audit logs and in some cases core data should be tamper-evident. That means that any modification to past data should be detectable upon inspection (included scheduled inspections). One of the strong aspects of blockchain is the markle trees and hash chains it uses to guarantee tamper-evidence. A similar cryptographic approach must be applied to public systems, otherwise we are shifting the corruption incentive to those who can alter the audit log.
- Legally sound use of cryptography – merkle trees are not legally defined, but other cryptographic tools are – trusted timestamps and digital signatures. Any document (or data) that carries legal meaning should be timestamped with the so called “(qualified) timestamp” according to the eIDAS EU regulation. Every document that needs a signature should be signed by an electronic signature (which is the legal name for the cryptographic term “digital signatures”). Private keys should always be stored on HSMs or smartcards to make sure they cannot leak. This prevents corruption as you can’t really spoof singatures or backdate documents. Backdating in particular is a common theme in corruption schemes, and a trusted cryptographic timestamp prevents that entirely.
- Identity and access management – traceability is great if you are sure you are “tracing” the right people. If identity and access management isn’t properly handled, impersonation, bruteforce or leaked credentials can make it easier for malicious internal (or external) actors to do improper stuff and frame someone else. It’s highly recommended to use 2FA, and possibly hardware tokens. For sysadmins it’s a must to use a privileged access management system (PAM).
- Data protection (encryption, backup management) – government data is sometimes sensitive – population registers, healthcare databases, taxes and customs databases, etc. They should not leak (captain obvious). Data leak prevention is a whole field, but I’d pinpoint two obvious aspects. The first is live data encryption – if you encrypt data granularly, and require decryption on the fly, you can centralize data access and therefore log every access. Otherwise, if the data in the database is in plaintext, there’s always a way to get it out somehow (Database activity monitoring (DAM) tools may help, of course). The second aspect is backup management – even if your production data is properly protected, encrypted, DAM’ed, your backup may leak. Therefore backup encryption is also important, and the decryption keys should be kept securely (ideally, wrapped by an HSM). How is data protection related to corruption? Well, these databases are sold on the black market, “privileged access” to sensitive data may be sold to certain people.
- Transparency – every piece of data that should not be protected, should be public. The more open data and public documents there are, the less likely it is for someone to try to manipulate data. If the published data says something, you can’t go and remove it, hoping nobody would see it.
- Randomness – some systems rely on randomness for a core feature – assigning cases. This is true for courts and for agencies who do inspections – you should randomly select a judge, and randomly assign someone to do an inspection. If you don’t have proper, audited, secure randomness, this can be abused (and it has been abused many times), e.g. to get the “right” judge in a sensitive case. We are now proposing a proper random case assignment system for the judiciary in my country. It should be made sure that /dev/random is not modified, and a distributed, cryptographically-backed random-generation system can be deployed. It sounds like too much complexity just for a RNG, but sometimes it’s very important to rely on non-controlled randomness (even if it’s pseudorandomness)
- Data validation – data should be subject to the maximum validation on entry. Any anomalies should be blocked from even getting into the database. Because the option for creating confusion helps corruption. For example there’s the so called “corruption cyrillic” – in countries that use the cyryllic alphabet, malicious users enter identically-looking latin charcter to hide themselves from searches and reports. Another example – in the healthcare system, reimbursement requests used to be validated post-factum. This creates incentives for corruption, for “under the table” correction of “technical mistakes” and ultimately, schemes for draining funds. If input data is validated not just a simple form inputs, but with a set of business rules, it’s less likely for deliberately incorrect data to be entered and processes
- Automated risk analysis – after data is entered (by civil servants, by external parties, by citizens), in some cases risk analysis should be done. For example, we are now proposing online registration of cars. However, some cars are much more likely to be stolen than others (based on price, ease of unlocking, currently operating criminals skillset, etc.). So the registration system should take into account all known factors and require the car to be presented at the traffic police for further inspection. Similarly for healthcare – some risk analysis on anomalous events (e.g. high-price medicines sold in unlikely succession) should be flagged automatically and inspected. That risk analysis should be based on carefully crafted methodologies, put into the system with something like a rules engine (rather than hardcoded, which I’ve also seen).
Throughout the years others and myself have managed to put some of those in laws and bylaws in Bulgaria, but there hasn’t been a systematic approach to ensuring that they are all followed, and followed properly. Which is the hard part, of course. Many people know the theory, it’s just not that easy to put in in practice in a complex environment. But these principles (and probably others that I miss) need to be the rule, rather than the exception in public sector information systems if we want to reduce corruption risks.
As a public official, I’ve put a lot of thought on how to make the current and upcoming public government information systems resistant to corruption. And I can list several main principles, some of them very technical, which, if followed, would guarantee that the information systems themselves achieve two properties:
- they prevent paper-based corruption
- they do not generate additional risk for corruption
So here are the principles that each information system should follow:
- Auditability – the software must allow for proper external audits. This means having the up-to-date source code available, especially for custom-built software. If it’s proprietary, it means “code available” contract clauses. This also means availability of documentation – what components it has, what integrations exist, what network and firewall rules are needed. If you can’t audit a system, it surely generates corruption
- Traceability – every meaningful action, performed by users of the system, should be logged. This means a full audit log not just for the application, but also for the underlying database as well as servers. If “delete entry” is logged at the application, but DELETE FROM is not logged by the database, we are simply shifting the corruption motives to more technically skilled people. I’ve seen examples of turned-off DB audit logs, and systems that (deliberately?) miss to log some important user actions. Corruption is thus built in the system or the configuration of its parts.
- Tamper-evidence – audit logs and in some cases core data should be tamper-evident. That means that any modification to past data should be detectable upon inspection (included scheduled inspections). One of the strong aspects of blockchain is the markle trees and hash chains it uses to guarantee tamper-evidence. A similar cryptographic approach must be applied to public systems, otherwise we are shifting the corruption incentive to those who can alter the audit log.
- Legally sound use of cryptography – merkle trees are not legally defined, but other cryptographic tools are – trusted timestamps and digital signatures. Any document (or data) that carries legal meaning should be timestamped with the so called “(qualified) timestamp” according to the eIDAS EU regulation. Every document that needs a signature should be signed by an electronic signature (which is the legal name for the cryptographic term “digital signatures”). Private keys should always be stored on HSMs or smartcards to make sure they cannot leak. This prevents corruption as you can’t really spoof singatures or backdate documents. Backdating in particular is a common theme in corruption schemes, and a trusted cryptographic timestamp prevents that entirely.
- Identity and access management – traceability is great if you are sure you are “tracing” the right people. If identity and access management isn’t properly handled, impersonation, bruteforce or leaked credentials can make it easier for malicious internal (or external) actors to do improper stuff and frame someone else. It’s highly recommended to use 2FA, and possibly hardware tokens. For sysadmins it’s a must to use a privileged access management system (PAM).
- Data protection (encryption, backup management) – government data is sometimes sensitive – population registers, healthcare databases, taxes and customs databases, etc. They should not leak (captain obvious). Data leak prevention is a whole field, but I’d pinpoint two obvious aspects. The first is live data encryption – if you encrypt data granularly, and require decryption on the fly, you can centralize data access and therefore log every access. Otherwise, if the data in the database is in plaintext, there’s always a way to get it out somehow (Database activity monitoring (DAM) tools may help, of course). The second aspect is backup management – even if your production data is properly protected, encrypted, DAM’ed, your backup may leak. Therefore backup encryption is also important, and the decryption keys should be kept securely (ideally, wrapped by an HSM). How is data protection related to corruption? Well, these databases are sold on the black market, “privileged access” to sensitive data may be sold to certain people.
- Transparency – every piece of data that should not be protected, should be public. The more open data and public documents there are, the less likely it is for someone to try to manipulate data. If the published data says something, you can’t go and remove it, hoping nobody would see it.
- Randomness – some systems rely on randomness for a core feature – assigning cases. This is true for courts and for agencies who do inspections – you should randomly select a judge, and randomly assign someone to do an inspection. If you don’t have proper, audited, secure randomness, this can be abused (and it has been abused many times), e.g. to get the “right” judge in a sensitive case. We are now proposing a proper random case assignment system for the judiciary in my country. It should be made sure that /dev/random is not modified, and a distributed, cryptographically-backed random-generation system can be deployed. It sounds like too much complexity just for a RNG, but sometimes it’s very important to rely on non-controlled randomness (even if it’s pseudorandomness)
- Data validation – data should be subject to the maximum validation on entry. Any anomalies should be blocked from even getting into the database. Because the option for creating confusion helps corruption. For example there’s the so called “corruption cyrillic” – in countries that use the cyryllic alphabet, malicious users enter identically-looking latin charcter to hide themselves from searches and reports. Another example – in the healthcare system, reimbursement requests used to be validated post-factum. This creates incentives for corruption, for “under the table” correction of “technical mistakes” and ultimately, schemes for draining funds. If input data is validated not just a simple form inputs, but with a set of business rules, it’s less likely for deliberately incorrect data to be entered and processes
- Automated risk analysis – after data is entered (by civil servants, by external parties, by citizens), in some cases risk analysis should be done. For example, we are now proposing online registration of cars. However, some cars are much more likely to be stolen than others (based on price, ease of unlocking, currently operating criminals skillset, etc.). So the registration system should take into account all known factors and require the car to be presented at the traffic police for further inspection. Similarly for healthcare – some risk analysis on anomalous events (e.g. high-price medicines sold in unlikely succession) should be flagged automatically and inspected. That risk analysis should be based on carefully crafted methodologies, put into the system with something like a rules engine (rather than hardcoded, which I’ve also seen).
Throughout the years others and myself have managed to put some of those in laws and bylaws in Bulgaria, but there hasn’t been a systematic approach to ensuring that they are all followed, and followed properly. Which is the hard part, of course. Many people know the theory, it’s just not that easy to put in in practice in a complex environment. But these principles (and probably others that I miss) need to be the rule, rather than the exception in public sector information systems if we want to reduce corruption risks.
Lmao