 |
 |
How does memory interleaving work to increase the speed of a
video card?
|
Are the acronyms confusing? try the Glossary of computer terms |
For chipset information, go Here
|
[From: Sam Goldwasser (sam@stdavids.picker.com) (with a bit by M. Scott)]
Memory interleaving using multiple banks of memory results in increased
video speed because the video processor is able to access data much
faster than possible with a single bank of video memory.
Interleaving means that the processor staggers memory read/writes between
2 or more banks of RAM, effectively multiplying the bandwidth of video
memory. If the processor requests a read/write at a given memory
location, the amount of time required to complete the operation is
limited by the memory's speed. I.e. 70 ns RAM will complete a read/write
request in no less than 70 ns. If however, the processor can handle data
at, say, twice that rate (i.e. 35 ns per clock cycle or 29 MHz) then it's
effectively working at half speed for memory transfers. If the processor
_interleaves_ memory accesses between two banks of RAM, then it can read/
write to bank 0 during one clock cycle, then instead of waiting for that
read/write cycle to finish before sending the next read/write request, it
immediately accesses bank 1 in the next clock cycle.
Interleaving is usually based on the low order (word) address bits. Two-
way interleaved 32 bit memory will thus select between 1 of 2 banks based
on address bit 2 (bits 0 and 1 select a byte within the 32 bit word).
Four-way interleaved 32 bit memory will use both bits 2 and 3 to select
one of four banks.
Note that interleaved memory is most easily implemented for write cycles
since these can be 'posted' - issued and forgotten about. Reads, on the
other hand, require that the processor keep track of the fact that one or
more read requests will be outstanding while it is issuing new ones.
This is a form of pipelining and not all processors are capable of
dealing with the necessary timing.
As an example of interleaving consider the following case of a processor
with a 33 MHz system bus accessing 70 ns memory:
Each clock cycle takes about 30 ns. If the processor wants to write to
a block of RAM, it will have to insert 2 wait states between consecutive
writes, meaning that for this example it will take 90 ns for each write.
However, if each of 2 banks of RAM can be accessed separately, then
instead of inserting 2 wait states and leaving the processor idle, it
will instead access the second bank of memory during the next clock
cycle and insert only one wait state. This has effectively doubled
memory throughput.
For this to work, the memory logic must latch the address, data, and
control signals so that as the processor moves on, the memory still
knows what to do. In addition, since not all memory accesses will be
to alternate banks, the system must know to insert wait states if
successive accesses are to the same bank.
Since frame buffer writes are often to large blocks (pixblts and fills),
interleaving can achieve almost an ideal n:1 speedup where n is the
interleave factor. The maximum practical value of n is limited by the
duration of a video processor clock cycle compared to RAM speed. The cost
in terms of hardware depends on the memory organization since the n banks
must be in separate memory chips. The efficiency of the system depends on
the memory technology as well as the size of the frame buffer. This is
one reason why a frame buffer which is not fully populated with memory
chips may not be able to take full advantage of its accelerated
capabilities. In addition to the memory, some modest amount of additional
logic is required for controlling each bank of memory and generating the
timing including the insertion of wait states where needed.
Other approaches which may be used by themselves or in conjunction with
interleaving include 'page mode' accesses and the use of VRAM instead of
DRAM.
|
|