Hello foxclab01, welcome to Bleeping Computer.
In case anyone reading this topic wonders what the answer to your question is, I'd like to briefly mention what you've had explained in different words elsewhere.
The CPU isn't required to issue an individual instruction for every bit (or it would actually be byte) transfer. This way of operating is called Programmed I/O (PIO) and has been long superseded by Direct Memory Access (DMA). This allows peripheral devices to move data across buses and interfaces themselves, without requiring intervention by the CPU. The CPU is only required to issue instructions defining the addresses where the data is located and the amount of data to be transferred. Assuming this is "burst data" (data occupying consecutive memory locations), this transfer can occur at whatever speed the combination of transmitting device, interface/bus and receiving device can achieve. The attention of the CPU is only required when the device controlling the transfer (the busmaster) flags the CPU to say a transaction is completed.
A device operating in PIO mode (e.g. a very old hard drive or a UDMA drive that the OS puts into PIO mode due to errors) will cause high CPU utilization and slow operation as you originally postulate.
As far as the PCI Express bus goes, the processors in a video card aren't involved in the operation of the bus itself, they process the data that is being transferred by the bus, and in modern GPUs with massive parallel processing there can indeed be hundreds of processing units.
Edited by Platypus, 12 August 2011 - 08:50 AM.