The following post was created for the second Cohort in MuseMatrix if you want to see the Spanish version click here
---
El siguiente post fue creado para el segundo Cohort en MuseMatrix si quieres ver la versión en español da clic aquí
Before delving into the concept of Decentralized Computing, it's crucial to analyze the current landscape of cloud infrastructure. According to Statista data, in the fourth quarter of 2023, 66% of all cloud infrastructure was dominated by just three tech giants: Amazon Web Services (31%), Microsoft Azure (24%), and Google Cloud Platform (11%) [1].
This massive concentration of computing power in the hands of so few companies should set off alarms for several reasons:
- Data Centralization: The fact that more than half of global computing is controlled by these companies raises serious concerns about the privacy and handling of our data.
- Misuse of Information: This data is frequently used in artificial intelligence models to generate detailed advertising profiles, which are then sold to large companies. The final destination and use of this personal information often remain uncertain.
- Vulnerability to Failures: Centralization also implies that a failure in one of these providers could have catastrophic consequences on a global scale, affecting millions of services and users simultaneously.
- Monopolistic Control: This situation gives these companies disproportionate power over global digital infrastructure, which can lead to anti-competitive practices and limit innovation.
- Censorship and Information Control: With such a concentration of power, these companies could potentially influence the flow of information and freedom of expression on the internet.
In the face of this scenario, Decentralized Computing emerges as a promising alternative.
Decentralized computing is a cloud architecture style characterized by the absence of a central point of authority to control access to data or logic residing within the network. Instead, authority is determined by algorithms shared by all nodes that are part of this distributed network [2].
Although the term sounds very current, it has been with us for several years. To be more precise, since the late 70s with the mixing networks proposed by David Chaum in his paper "Untraceable electronic mail, return addresses, and digital pseudonyms," which dealt with a routing protocol to make sender-receiver tracking between messages more difficult [3]. Following this, networks such as Tor, BitTorrent, and even blockchains have paved the way for these types of tools.
After the blockchain boom, we realized one thing: while we can have a decentralized database, the problem lies in the tools we need, as these are centralized. What's the use of having contracts that interact with a web page when your provider can pull the plug at any time? What's the point of permanently recording data in a smart contract when we know that this data is in a service that may eventually shut down?
Decentralized Storage
Before covering more of the broad range of current decentralized computing, let's talk about the solution given by several in response to the question from the previous section: decentralized storage. In short, it's a network of nodes where each has a replica or fragment of data without being revealed, as it is encrypted for security. This has helped ensure that not only the "receipt" for certain data benefits from decentralization, but also the data itself has that benefit. This approach solves several of the problems mentioned earlier, providing greater resistance to censorship, improving privacy, and reducing vulnerability to failures of a single central point. Services like IPFS and Arweave were created in their conception as a decentralized storage solution.
Computation on Data
After solving these problems, several protocols began to notice that they not only need to store data but also create, edit, or compute on it. However, the problems proposed earlier still existed. There were several on-chain proposals to solve this, but many compromised security, scalability, or decentralization on those networks. A proposal was reached: what if said data is processed externally, in a decentralized manner, and without compromising blockchains in the process? Computation on data involves running computational processes on stored data, reducing the workload of data management and ensuring consistent storage and access control for the user's dataset.
One of the main ones is Bacalhau, created by Protocol Labs, the same creators of IPFS, in which they place hash proofs of the calculation operations performed or retransmit the output produced (either a file or serialized data) on said technology.
According to Dhruv Malik, this significantly solves the challenges faced by typical web2-based computing platforms such as [4]:
- Reduction of the need to establish initial Extract-Transform-Load stages, as job execution is atomic and does not require any separate data transformation. The entire operation is performed in the single computation operation by streaming data from the data source.
- Does not give up user access to the data throughout the operation's lifecycle.
- Uses decentralized/distributed data stores that are resistant to data corruption and cost less.
- Is scalable, as it can orchestrate multiple computation jobs. However, it depends on how your framework handles job synchronization issues, which will be discussed in the next "challenges" section.
In conclusion, decentralized computing emerges as a crucial response to the challenges posed by the centralization of digital power. From distributed storage to computation on data, these technologies promise a more secure, private, and censorship-resistant internet. Although we face challenges in their implementation and adoption, the potential of decentralization to transform our digital infrastructure is undeniable. As we move forward, it is vital to continue innovating and educating about these systems, as they represent not only a technological evolution but a step towards a more equitable and free digital future for all.
Bibliography
- “Cloud Infrastructure Services Vendor Share 2023.” Statista, https://www.statista.com/statistics/477277/cloud-infrastructure-services-market-share/. Accessed 18 Sept. 2024.
- Reselman, Bob. “Smart Contracts, Blockchain and Decentralized Computing.” TheServerSide, https://www.theserverside.com/tip/Smart-contracts-blockchain-and-decentralized-computing. Accessed 18 Sept. 2024.
- Chaum, David L. “Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms.” Commun. ACM, vol. 24, no. 2, Feb. 1981, pp. 84–90. ACM Digital Library, https://doi.org/10.1145/358549.358563.
- Malik, Dhruv. “Developing Compute-over-Data for Geospatial Data Processing: An Overview.” Circum Protocol, 14 Aug. 2023, https://medium.com/circum-protocol/developing-compute-over-data-for-geospatial-data-processing-an-overview-41f6f6bd2481.