Python SDK¶
Official Python client for GVDB. Full CRUD, hybrid search, streaming inserts, per-vector TTL, and bulk import from Parquet, NumPy, pandas, CSV, and AnnData.
Install¶
pip install gvdb
# With bulk import extras (Parquet, NumPy, Pandas, progress bar)
pip install gvdb[import]
# Everything including AnnData for single-cell workflows
pip install gvdb[import-all]
See client API for the full method reference and bulk import for loading large datasets.
Optional dependency extras¶
| Extra | Dependencies | For |
|---|---|---|
gvdb[parquet] |
pyarrow | import_parquet |
gvdb[numpy] |
numpy | import_numpy |
gvdb[pandas] |
pandas, pyarrow | import_dataframe, import_csv |
gvdb[h5ad] |
anndata, numpy | import_h5ad |
gvdb[progress] |
tqdm | Progress bars during bulk imports |
gvdb[import] |
All above except anndata | Common ML workflows |
gvdb[import-all] |
Everything + polars | All formats |
Quick start¶
from gvdb import GVDBClient
client = GVDBClient("localhost:50051", api_key="your-key") # api_key optional
# Create a collection
client.create_collection("my_vectors", dimension=768)
# Insert vectors with metadata (so hybrid search has a BM25 field)
client.insert(
"my_vectors",
ids=[1, 2],
vectors=[[0.1]*768, [0.3]*768],
metadata=[{"description": "running shoes"}, {"description": "kitchen knives"}],
)
# Search
results = client.search("my_vectors", query_vector=[0.1]*768, top_k=10)
for r in results:
print(f"ID: {r.id}, distance: {r.distance}")
# Hybrid search (BM25 + vector)
results = client.hybrid_search(
"my_vectors",
query_vector=[0.1]*768,
text_query="running shoes",
text_field="description",
top_k=10,
return_metadata=True,
)
# Clean up
client.drop_collection("my_vectors")
client.close()
Next¶
- Client API — every method and its parameters
- Bulk import — Parquet, NumPy, pandas, CSV, h5ad
- Examples — runnable scripts