Efficient GPU Training ====================== To train deep detectors efficiently, we usually use `CUDA `_ to accelerate the detector training on GPU. PyGOD provides ``gpu`` parameter for ``DeepDetector``. During initialization, we can set ``gpu`` to the index of the GPU that is available. By default, ``gpu=-1``, which means train the detector on CPU. Here is an example of initialize ``DOMINANT`` with the first GPU (index of ``0``): .. code:: python DOMINANT(gpu=0) However, training deep detectors on large-scale graphs can be memory-intensive, especially on the detectors relying on adjacency matrix reconstruction. At this time, full batch training may result in out-of-memory (OOM) error. As such, we divide the large graph into minibatches, and train the detector on each batch. PyGOD provides ``batch_size`` parameter for ``DeepDetector``, where users are able to adjust the size of each batch for various GPU memory. We recommend users setting ``batch_size`` to largest value that will not cause OOM. For instance, we would like to train ``DOMINANT`` with the batches of 64 nodes: .. code:: python DOMINANT(gpu=0, batch_size=64) Unlike other data modalities, the output of each node in graphs rely on its neighbors. In PyGOD implementation, we adopt the data loader ``torch_geometric.loader.NeighborLoader`` in PyG to load both the center nodes and the neighbor nodes for minibatches. But the computation on neighbor nodes will lead to significant overhead and reduce the efficiency in the detector training. Thus, we neighbor sampling is crucial to reduce the overhead. PyGOD provides ``num_neigh`` parameter for ``DeepDetector``. We can specify how many neighbors are sampled at each layer of the detector. The default value of ``num_neigh`` is ``-1``, indicating sample all neighbors of the center node. If we want to sample 5 neighbors at each layer, we can initialize ``DOMINANT`` like: .. code:: python DOMINANT(gpu=0, batch_size=64, num_neigh=5) We can also sample different number of neighbors at each layer by setting ``num_neigh`` as a list, but the length of the list has to match with the number of layers ``num_layers``: .. code:: python DOMINANT(gpu=0, batch_size=64, num_layers=2, num_neigh=[5, 3]) To learn more, read PyG's tutorial on `Scaling GNNs via Neighbor Sampling `_.