ArcticDB_demo_snapshots
在 Github 中查看 | 在 Google Colab 中打开快照:如何使用以及为何有用

快照简介¶
为了理解快照,我们首先需要明确版本。
在 ArcticDB 中,每当对一个符号进行更改时,都会创建一个新版本。因此,每个符号都有一个随时间推移的版本序列。
在一个库中,通常会有许多符号,每个符号都有许多版本。
假设我们到达一个点,希望记录库中数据的当前状态。这正是快照的目的。
快照记录了库中所有符号的当前版本(或一组自定义版本,见下文)
快照中记录的数据随后可以使用读取操作中的 as_of
参数进行读取。
作为快照一部分的版本受到保护,即使其符号被删除,也不会被删除。
下面是一个演示快照实际使用的简单示例。
安装和导入¶
In [1]
已复制!
!pip install arcticdb
!pip install arcticdb
In [2]
已复制!
import pandas as pd
import logging
import arcticdb as adb
import pandas as pd import logging import arcticdb as adb
In [3]
已复制!
lib_name = 'demo'
arctic = adb.Arctic("lmdb://arcticdb_snapshot_demo")
if lib_name in arctic.list_libraries():
arctic.delete_library(lib_name)
lib = arctic.get_library('demo', create_if_missing=True)
lib_name = 'demo' arctic = adb.Arctic("lmdb://arcticdb_snapshot_demo") if lib_name in arctic.list_libraries(): arctic.delete_library(lib_name) lib = arctic.get_library('demo', create_if_missing=True)
创建一些符号¶
In [4]
已复制!
num_symbols = 4
symbols = [f"sym_{idx}" for idx in range(num_symbols)]
half_symbols = symbols[:num_symbols // 2]
print(symbols)
print(half_symbols)
num_symbols = 4 symbols = [f"sym_{idx}" for idx in range(num_symbols)] half_symbols = symbols[:num_symbols // 2] print(symbols) print(half_symbols)
['sym_0', 'sym_1', 'sym_2', 'sym_3'] ['sym_0', 'sym_1']
In [5]
已复制!
# write data for each symbol
for idx, symbol in enumerate(symbols):
lib.write(symbol, pd.DataFrame({"col": [idx]}))
# write data for each symbol for idx, symbol in enumerate(symbols): lib.write(symbol, pd.DataFrame({"col": [idx]}))
In [6]
已复制!
# write data only for the first half of the symbols
for idx, symbol in enumerate(half_symbols):
lib.write(symbol, pd.DataFrame({"col": [idx+10]}))
# write data only for the first half of the symbols for idx, symbol in enumerate(half_symbols): lib.write(symbol, pd.DataFrame({"col": [idx+10]}))
创建快照¶
元数据是可选的
In [7]
已复制!
lib.snapshot("snapshot_0", metadata="this is the core of the demo")
lib.snapshot("snapshot_0", metadata="this is the core of the demo")
发现和检查快照的函数¶
In [8]
已复制!
# list all snapshots
lib.list_snapshots()
# list all snapshots lib.list_snapshots()
Out[8]
{'snapshot_0': 'this is the core of the demo'}
In [9]
已复制!
# list the symbols in a snapshot
lib.list_symbols(snapshot_name="snapshot_0")
# list the symbols in a snapshot lib.list_symbols(snapshot_name="snapshot_0")
Out[9]
['sym_2', 'sym_1', 'sym_0', 'sym_3']
In [10]
已复制!
# list the versions in a snapshot
lib.list_versions(snapshot="snapshot_0")
# list the versions in a snapshot lib.list_versions(snapshot="snapshot_0")
Out[10]
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']), sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0'])}
In [11]
已复制!
# list all versions in the library, with associated snapshots
lib.list_versions()
# list all versions in the library, with associated snapshots lib.list_versions()
Out[11]
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00), sym_0_v1: (date=2023-11-20 10:24:45.413203317+00:00, snapshots=['snapshot_0']), sym_0_v0: (date=2023-11-20 10:24:45.041944641+00:00)}
读取符号的快照版本¶
In [12]
已复制!
vit = lib.read("sym_0", as_of="snapshot_0")
print(vit)
print(vit.data)
vit = lib.read("sym_0", as_of="snapshot_0") print(vit) print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
In [13]
已复制!
vit = lib.read("sym_3", as_of="snapshot_0")
print(vit)
print(vit.data)
vit = lib.read("sym_3", as_of="snapshot_0") print(vit) print(vit.data)
VersionedItem(symbol='sym_3', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=0, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 3
演示快照版本受到保护,不会被删除¶
In [14]
已复制!
# delete the symbol sym_0
lib.delete("sym_0")
# delete the symbol sym_0 lib.delete("sym_0")
In [15]
已复制!
# show that sym_0 has been deleted
lib.list_symbols()
# show that sym_0 has been deleted lib.list_symbols()
Out[15]
['sym_2', 'sym_1', 'sym_3']
In [16]
已复制!
# sym_0 does not appear in the current library versions
lib.list_versions()
# sym_0 does not appear in the current library versions lib.list_versions()" -> "# sym_0 does not appear in the current library versions lib.list_versions()
Out[16]
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_0']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_0']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_0']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
In [17]
已复制!
# however we can still read the version of sym_0 that was recorded in the snapshot
vit = lib.read("sym_0", as_of="snapshot_0")
print(vit)
print(vit.data)
# however we can still read the version of sym_0 that was recorded in the snapshot vit = lib.read("sym_0", as_of="snapshot_0") print(vit) print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
虽然可行,但我们建议不要直接使用版本号读取快照版本¶
这些版本之所以存在,只是因为它们包含在快照中,因此通过快照访问它们的代码会更清晰。
通过版本号访问受快照保护的版本会导致代码失败(如果快照被删除),且这种失败方式难以理解。
In [18]
已复制!
vit = lib.read("sym_0", as_of=1)
print(vit)
print(vit.data)
vit = lib.read("sym_0", as_of=1) print(vit) print(vit.data)
VersionedItem(symbol='sym_0', library='demo', data=<class 'pandas.core.frame.DataFrame'>, version=1, metadata=None, host='LMDB(path=/users/isys/nclarke/jupyter/arctic/demos/arcticdb_snapshot_demo)') col 0 10
In [19]
已复制!
# version 0 was not in the snapshot, so it has been removed
try:
vit = lib.read("sym_0", as_of=0)
print(vit)
print(vit.data)
except adb.exceptions.NoSuchVersionException:
logging.error("Version not found")
# version 0 was not in the snapshot, so it has been removed try: vit = lib.read("sym_0", as_of=0) print(vit) print(vit.data) except adb.exceptions.NoSuchVersionException: logging.error("Version not found")
ERROR:root:Version not found
删除快照¶
当我们删除一个快照时,任何仅被该快照引用的版本都将被删除。
In [20]
已复制!
lib.delete_snapshot("snapshot_0")
lib.delete_snapshot("snapshot_0")
In [21]
已复制!
lib.list_snapshots()
lib.list_snapshots()
Out[21]
{}
In [22]
已复制!
# version 1, which was kept as part of the snapshot, has now been deleted
try:
vit = lib.read("sym_0", as_of=1)
print(vit)
print(vit.data)
except adb.exceptions.NoSuchVersionException:
logging.error("Version not found")
# version 1, which was kept as part of the snapshot, has now been deleted try: vit = lib.read("sym_0", as_of=1) print(vit) print(vit.data) except adb.exceptions.NoSuchVersionException: logging.error("Version not found")
ERROR:root:Version not found
In [23]
已复制!
lib.list_versions()
lib.list_versions()
Out[23]
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
快照名称必须唯一¶
创建与现有快照同名的快照会导致错误。
In [24]
已复制!
lib.snapshot("snapshot_1", metadata="demo snapshot names need to be unique")
lib.snapshot("snapshot_1", metadata="demo snapshot names need to be unique")
In [25]
已复制!
try:
lib.snapshot("snapshot_1")
except Exception as e:
logging.error(e)
try: lib.snapshot("snapshot_1") except Exception as e: logging.error(e)
ERROR:root:E_ASSERTION_FAILURE Snapshot with name snapshot_1 already exists
In [26]
已复制!
lib.list_snapshots()
lib.list_snapshots()
Out[26]
{'snapshot_1': 'demo snapshot names need to be unique'}
快照创建修饰符:排除或包含符号¶
In [27]
已复制!
# exclude sym_1 from snapshot
lib.snapshot("snapshot_2", skip_symbols=["sym_1"], metadata="demo skip_symbols")
# exclude sym_1 from snapshot lib.snapshot("snapshot_2", skip_symbols=["sym_1"], metadata="demo skip_symbols")
In [28]
已复制!
lib.list_versions()
lib.list_versions()
Out[28]
{sym_3_v0: (date=2023-11-20 10:24:45.103129257+00:00, snapshots=['snapshot_1', 'snapshot_2']), sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2']), sym_1_v1: (date=2023-11-20 10:24:45.431966093+00:00, snapshots=['snapshot_1']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00)}
In [29]
已复制!
# include specific versions of sym_1 and sym_2 from snapshot
lib.snapshot("snapshot_3", versions={"sym_1": 0, "sym_2": 0}, metadata="demo versions")
# include specific versions of sym_1 and sym_2 from snapshot lib.snapshot("snapshot_3", versions={"sym_1": 0, "sym_2": 0}, metadata="demo versions")
In [30]
已复制!
lib.list_versions(snapshot="snapshot_3")
lib.list_versions(snapshot="snapshot_3")
Out[30]
{sym_2_v0: (date=2023-11-20 10:24:45.086132551+00:00, snapshots=['snapshot_1', 'snapshot_2', 'snapshot_3']), sym_1_v0: (date=2023-11-20 10:24:45.066268214+00:00, snapshots=['snapshot_3'])}
In [31]
已复制!
lib.list_snapshots()
lib.list_snapshots()
Out[31]
{'snapshot_1': 'demo snapshot names need to be unique', 'snapshot_2': 'demo skip_symbols', 'snapshot_3': 'demo versions'}
更多信息 / 补充¶
有关上面使用的函数的完整说明,请参阅 ArcticDb 文档
snapshot()
https://docs.arcticdb.cn/latest/api/library/#arcticdb.version_store.library.Library.snapshotlist_snapshots()
https://docs.arcticdb.cn/latest/api/library/#arcticdb.version_store.library.Library.list_snapshotslist_versions()
https://docs.arcticdb.cn/latest/api/library/#arcticdb.version_store.library.Library.list_versions