5 stars based on
Content-addressable storagealso referred to as associative storage or abbreviated CASis a mechanism for storing information that can be retrieved based on its content, not its storage location. It is typically used for high-speed storage and retrieval of fixed contentsuch as documents stored for compliance with government regulations.
Roughly speaking, content-addressable storage is the permanent-storage analogue to content-addressable memory. If the hash function is weak, this method could be subject to collisions in an adversarial environment different documents returning the same hash. When being contrasted with content-addressed storage, a typical local or networked storage device is referred to as location-addressed. In a location-addressed storage device, each element of data is stored onto the physical medium, and its location recorded for later use.
The storage device often keeps a list, or directory, of these locations. When a future request is made for a particular item, the request includes only the location for example, path and file names of the data. The storage device can then use this information to locate the data on the physical medium, and retrieve it. When new information is written into a location-addressed device, it is simply stored in some available free space, without regard to its content.
The information at a given location can usually be altered or completely overwritten without any special action on the part of the storage device. Within the scope of this discussion, a good way to think of the above is as container-addressed storage. The search logic was incorporated into the disk controller. A query expressed in a high-level query language could be compiled into a search specification that was then sent to the disk controller for execution.
In contrast, when information is stored into a CAS system, the system will record a content addresswhich is an identifier uniquely and permanently linked to the information content itself.
A request to retrieve information from a CAS system must provide the content identifier, from which the system can determine storing meta data and symbolic link in the database to where a binary file is located physical location of the data and retrieve it. Because the identifiers are based on content, any change to a data element will necessarily change its content address. In nearly all cases, a CAS device will not permit editing information once it has been stored.
Whether it can be deleted is often controlled by a policy. While the idea of content-addressed storage is not new, production-quality systems were not readily available until roughly CAS storage works most efficiently on data that does not change often. It is of particular interest to large organizations that must comply with document-retention laws, such as Sarbanes-Oxley.
In these corporations a large volume of documents will be stored for as much as a decade, with no changes and infrequent access. CAS is designed to make storing meta data and symbolic link in the database to where a binary file is located searching for a given document content very quick, and provides an assurance that the retrieved document is identical to the one originally stored.
If the documents were different, their content addresses would differ. In addition, since data is stored into a CAS system by what it contains, there is never a situation where more than one copy of an identical document exists in storage. By definition, two identical documents have the same content address, and so point to the same storage location. For data that changes frequently, CAS is not as efficient as location-based addressing.
In these cases, the CAS device would need to continually recompute the address of data as it was changed, and the client systems would be forced to continually update information regarding where a given document exists. For random access systems, a CAS would also need to handle the possibility of two initially identical documents diverging, requiring a copy of one document to be created on demand.
FilePool was acquired in and became the underpinnings of the first commercially available CAS system, which was introduced as EMC's Centera platform. The access nodes maintain a synchronized directory of content addresses, and the corresponding storage node where each address can be found. When a new data element, or blob Binary large objectis added, the device calculates a hash of the content and returns this hash as the blob's content address. If the content already exists, the device does not need to perform any additional steps; the content address already points to the proper content.
Otherwise, the data is passed off to a storage node and written to the physical media. When a content address is provided to the device, it first queries the directory for the physical location of the specified content address.
The information is then retrieved from a storage node, and the actual hash of the data recomputed and verified. Once this is complete, the device can supply the requested data to the client. Within the Centera system, each content address actually represents a number of distinct data blobs, as well as optional metadata. Whenever a client adds an additional blob to an existing content block, the system recomputes the content address.
To provide additional data security, the Centera access nodes, when no read or write operation is in progress, constantly communicate with the storage nodes, checking the presence of at least two copies of each blob as well as their integrity.
Additionally, they can be configured to exchange data with a different, e. This provides for additional flexibility in disaster recovery situations as well as the ability to reduce storage costs by moving data off disk to tape.
Another typical implementation is iCAS from iTernity. The concept of iCAS is based on containers. Each container is addressed by its hash value. A container holds different numbers of fixed content documents. The container is not changeable and the hash value is fixed after the write process.
One of the very first content-addressed storage servers, Venti was originally developed for Plan 9 from Bell Labs and is now also available for Unix-like systems as part of Plan 9 from User Space. Git is a userspace CAS filesystem. However it is primarily used as a source code control system. It relies on Git and symbolic links to index their filesystem location. Bitcache is an open source distributed implementation of CAS written in Ruby. Perkeep is a recent project to bring the advantages of content-addressable storage "to the masses".
It is intended to be used for a wide variety of use cases, including distributed backup; a snapshotted-by-default, version-controlled filesystem; and decentralised, permission-controlled filesharing. Irmin is an ocaml "library for persistent stores with built-in snapshot, branching and reverting mechanisms"; the same design principles as Git. Arvados Keep is an open source content-addressable distributed storage system.
Infinit is a content-addressable and decentralized peer-to-peer storage platform that was acquired by Docker Inc. From Wikipedia, the free encyclopedia. This article has storing meta data and symbolic link in the database to where a binary file is located issues.
Please help improve it or discuss these issues on the talk page. Learn how and when to remove these template messages. This article may be confusing or unclear to readers. Please help us clarify the article. There might be a discussion about this on the talk page. February Learn how and when to remove this template message.
This article may contain an excessive amount of intricate detail that may only interest a specific audience. Please help by spinning off or relocating any relevant information, and removing excessive detail that may be against Wikipedia's inclusion policy. This article may need to be rewritten entirely to comply storing meta data and symbolic link in the database to where a binary file is located Wikipedia's quality standards. The discussion page may contain suggestions.
This section contains content that is written like an advertisement. Please help improve it by removing promotional content and inappropriate external linksand by adding encyclopedic content written from a neutral point of view. June Learn how and when to remove this template message. Archived from the original on 12 October Retrieved from " https: Associative arrays Computer storage devices. All articles with dead external links Articles with dead external links from December Use dmy dates from March Wikipedia articles needing clarification from February All Wikipedia articles needing clarification Wikipedia articles needing style editing from February All articles needing style editing Wikipedia articles needing rewrite from February All articles needing rewrite Articles with multiple maintenance issues Articles with a promotional tone from June All articles with a promotional tone All articles with unsourced statements Articles with unsourced statements from March Views Read Edit View history.