Differences between “soft delete” and “hard delete” in the Databases, their positives and negatives

What are “soft delete” and “hard delete”?

When we use the delete command for deleting records this is the physical delete or hard delete.

When the real data is not deleted from storage, but simply marked as deleted using the update command, this is logical delete or soft delete. In the data that should be marked for deletion, for example, the boolean field isDeleted.

Before I started, I did little research about these methods on the internet. I realized that the main reason why developers use soft delete is that it is can be easily recovered deleted records and can be easily displayed the history of deleted records on the screen of users. Already soft delete has only one advantage. These are easily recovered records. But has many disadvantages. Now I will explain to you in detail these disadvantages

– Soft Delete

  • If the table data will be frequently deleted, then in a few years your tables will be full of unnecessary deleted records, maybe even more than a million records. Table sizes and index sizes will be increasing. So, performance will be down. If the table records exceed one billion, then some SQL developers (or DB Admins) generally planning used partition table technology, because the partition table mechanism physically stores some of the table records selected by using our condition in other tables on the internal DB. The goal is to decrease the amount of data on the table for increasing performance. But, by using soft delete we are increasing the amount of data using unnecessary records.
  • You must be added the condition “where isDelete = false” to your all queries. So, performance will be down again.
  • Sometimes for pagination or for some other process, we need to get a count of records for large tables, which has million or billion records. This process will work too late and will use large memory of the Database server if we write “select count(*) from this table”. DB will be used a full table scan when we use the count function. In such cases, some SQL developers can use the statistic values of DB, which are stored in system tables (information schema) of DB. There are many useful pieces of information in the system tables. One of them records the count of all Database tables. Using this method we get very very high performance, so gettable records to count cost us approximately 1-5 milliseconds. But from the system tables, we can get a count of only full records of tables, without any condition. (P.S. Records count of tables if getting from the system schema may not be correct, so this is approximate value) So you can not use this method too if needed. Because we have a condition “where isDelete = false” in our queries.
  • You can’t use a unique index on your tables. When users perform delete operations, your records are never deleted from tables, but users may think so the record is deleted. So the user will want to add the same username value or email value. There is only one way to solve this problem, which is to you can write your own codes for controlling unique values on DB.
  • You can not use Foreign-Key features too on your tables. Using Foreign-Key you can delete cascade relation records in other tables or are not be allowed to delete records used in other tables. Most commonly used the second mode, not allowed to delete records. Your data will be securely controlled via DB automatically when using Foreign Keys. If you can not use Foreign-Key then you will have to write all these controlled mechanisms manually, but it is very difficult and time-consuming to write it.
  • If you suddenly need to use the delete trigger then you will not be able to use it, so we don’t use the delete command. But when you use update trigger then you must write additional scripts for separating really updated data and deleting data via updating.

Given all this, as an experienced SQL Developer, I can say that it is better to use hard delete. Soft delete is often a choice for developers who don’t have much deep knowledge of Database skills. But professional DB Admins and SQL Developers recommend using a hard delete.

Many people think that, it is impossible to recover data deleted during hard delete. It is a wrong idea. A professional approach to work is that we must store old values and new values of all tables for the delete or update process. This is a logging mechanism. Actually, large projects should have a detailed logging module. So, we need to create a detailed logging process that can log all user operations too, such as user login, user searching, viewing reports, and other operations. When using soft delete, we can recover only deleted records, but now we will recover deleted and updated records too. Now but this idea has been very nice and very professional. Use hard delete and create a logging mechanism on DB this is the best way. Maybe most developers do not have knowledge about the logging mechanisms on DB. Therefore, they are used soft delete.

Modern Databases have a very good opportunity to log data. One of the is PostgreSQL. PostgreSQL has JSON and JSONB types and has a special indexing mode for these types. Another Database has a JSON type too. But PostgreSQL has many functions for converting data to JSON, rows to JSON functions for extracting JSON values and parameters, generating JSON types from other types, converting JSON to other types and more. And the same time PostgreSQL is very faster than other Databases. It is the only database that can overtake almost MongoDB speed in NoSQL capabilities. But PostgreSQL is an advanced, enterprise-class, and open-source relational database system which supports both SQL (relational) and JSON (non-relational) querying.