Item Metadata API: Read
On this page
Item Metadata API: Read#
Item Metadata API (MDAPI) Read fetches an item’s metadata.
(NOTE: In this document, JSON is pretty-printed for legibility.)
An item’s metadata may be fetched by making an HTTP GET request to
https://archive.org/metadata/{identifier}
where {identifier}
is the exact item identifier. MDAPI will return the item’s metadata record in JSON format.
An example request:
GET /metadata/xfetch HTTP/1.1
Host: archive.org
And a successful response (some fields trimmed for compactness):
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{
"created": 1616004182,
"d1": "ia600308.us.archive.org",
"d2": "ia800308.us.archive.org",
"dir": "/21/items/xfetch",
"files": [
{
"name": "xfetch.pdf",
"source": "original",
"format": "Text PDF",
"mtime": "1479169618",
"size": "419170",
"md5": "ee68c235c2cfc1007a5ab998d21d643c",
"crc32": "4db3022d",
"sha1": "abceccfc74a577ed06a5aeb7bebe365e9ff8946d"
},
{
"name": "xfetch_files.xml",
"source": "original",
"format": "Metadata",
"md5": "d6ea3edc409f11ecfd69c061719e9368",
"summation": "md5"
},
// ... additional files ...
],
"files_count": 13,
"item_last_updated": 1613804036,
"item_size": 10682344,
"metadata": {
"mediatype": "texts",
"collection": [
"opensource",
"community"
],
"title": "XFETCH: Optimal Probabilistic Cache Stampede Prevention",
// ... additional key-value metadata fields ...
},
"server": "ia800308.us.archive.org",
"uniq": 267404775,
"workable_servers": [
"ia800308.us.archive.org",
"ia600308.us.archive.org"
]
}
Most of the top-level fields (d1
, dir
, files_count
, etc.) are available across all items. Some are optional and not always available. Others depend on the content within the item itself.
A full description of each item record field may be found here.
Partial reads#
Partial MDAPI records may be fetched by addressing the field (or even a sub-field) directly. For example, to fetch the server
field, call https://archive.org/metadata/xfetch/server
.
A partial fetch successful response:
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{
"result": "ia800308.us.archive.org"
}
Or, to fetch the first file in the files
list, call https://archive.org/metadata/xfetch/files/0
:
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{
"result": {
"name": "__ia_thumb.jpg",
"source": "original",
"mtime": "1613804036",
"size": "14393",
"format": "JPEG Thumb",
"rotation": "0",
"md5": "1fe0b765283352b5556e6dea4e0fb4ae",
"crc32": "cd8a1de7",
"sha1": "f39cb5dec565af57de5cca03a77edf6231aa1dd3"
}
}
When using partial reads, two query parameters are available for slicing elements out of a large array:
start
- index to start slicecount
- length of slice
For example, if an item has hundreds of files, files from 100 to 104 may be retrieved as so:
https://archive.org/metadata/gov.uspto.patents.application.10743335/files?start=100&count=5
Errors#
If the item identifier does not exist or cannot be found, MD Read returns an empty array.
For other errors, MDAPI returns an error
field with a human-readable string explaining the problem.
If a partial read is performed (e.g., https://archive.org/metadata/xfetch/files/0
), MD Read will return an error
if the item cannot be found, not an empty array.
PHP interface#
PHP code internal to Petabox may also access MDAPI via its PHP library. The following are the recommended calls for reading metadata.
Common options#
Read functions accept an array of options (where the option name is an array key). Many of these options are common to all or most read functions:
primaryonly
(bool, default: false) - Only return the item’s PRIMARY server in theserver
fieldrecache
(bool, default: false) - Force a recompute of the metadata recordlookahead
(bool, default: true) - Enable lookaheadauthed
(bool, default: false) - Authorizes all JSON file readsfiles_limit
(false|int, default: false) - Ifint
, max. number of files to return infiles
fieldget_timeout
(false|int, default: false) - Ifint
, timeout (in seconds) for reading user JSON filesdontcache
(bool, default: false) - If record is recomputed, do not write it to the record serveroffline_ok
(bool, default: false) -server
may include OFFLINE datanodeextended_err
(bool, default: false) - Enable extended errors
Some options are unavailable with a function for specific reasons. For example, get_obj_minimal()
doesn’t accept lookahead
because none of the fields it returns require lookahead.
Other options are specific to one or two functions, and are not listed above. They are detailed in the function information below.
Metadata::get_obj()#
Metadata::get_obj(string $item_id, array $options = []): array
Options:
primaryonly
(bool)recache
(bool)lookahead
(bool)authed
(bool)files_limit
(false|int)get_timeout
(false|int)dontcache
(bool)offline_ok
(bool)extended_err
(bool)
Returns the item’s entire metadata record as a PHP associative array.
Metadata::get_obj_part()#
Metadata::get_obj_part(string $item_id, string $part, array $options = []): array
Options:
start
(int|bool) - Start index of$part
slicecount
(int) - Count of$part
sliceauthed
(bool)get_timeout
(false|int)recache
(bool)dontcache
(bool)offline_ok
(bool)extended_err
(bool)
Returns only the specified field ($part
) of the metadata record as a PHP associative array. The array will have a result
key holding the field.
$part
should be a JSON Pointer, e.g., metadata
or /metadata/collection
.
If only one field is required by the caller, this call can potentially be more efficient than a full get_obj()
fetch.
Unlike other calls, if the item doesn’t exist or cannot be found, this call returns an error
with explanatory text.
Metadata::get_obj_minimal()#
Metadata::get_obj_minimal(string $item_id, array $options = []): array
Options:
primaryonly
(bool)get_timeout
(false|int)dontcache
(bool)offline_ok
(bool)extended_err
(bool)
Returned fields:
d1
,d2
,dir
,solo
files_count
,item_size
server
,workable_servers
,servers_unavailable
is_dark
,nodownload,
is_collection
conflict
Returns a smaller number of fields, which is potentially more efficient than a full get_obj()
fetch.
Metadata::get_obj_fast()#
Metadata::get_obj_fast(string $item_id, array $options = []): array
Options:
primaryonly
(bool)offline_ok
(bool)recache
(bool)extended_err
(bool)
Returned fields:
d1
,d2
,dir
,solo
server
,workable_servers
,servers_unavailable
is_dark
,nodownload
,is_collection
conflict
Returns an ultra-minimal number of fields for the item. Essentially, it only provides the item’s location and a handful of relevant flags.
Extended errors#
To simplify error-handling, extended errors can be enabled. This is a newer feature of MDAPI and is not enabled by default for backwards compatibility.
When enabled, the returned value is different in two important ways:
For a certain class of errors, an
errcode
value is returned along witherror
. This integer code is machine-parsable. SeeMetadata\ExtendedError
for a full list of codes.If an item cannot be found, rather than returning an empty array,
errcode
anderror
are returned.
Not all MDAPI errors have been adapted to extended errors at this time. No errcode
will be present for these errors.
Extended errors are currently unavailable to the /metadata/
endpoint.