The DMAC has the requirement that the length of the transfer is aligned to
the widest interface width. E.g. if the widest interface is 256 bit or 32
bytes the length of the transfer needs to be a multiple of 32.
This restriction can be relaxed for the memory mapped interfaces. This is
done by partially ignoring data of a beat from/to the MM interface.
For write access the stb bits are used to mask out bytes that do not
contain valid data.
For read access a full beat is read but part of the data is discarded. This
works fine as long as the read access is side effect free. I.e. this method
should not be used to access data from memory mapped peripherals like a
FIFO.
This means that for example the length alignment requirement of a DMA
configured for a 64-bit memory and a 16-bit streaming interface is now only
2 bytes instead of 8 bytes as before.
Note that the address alignment requirement is not affected by this. The
address still needs to be aligned to the width of the MM interface that it
belongs to.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
FPGAs support different widths for the read and write port of the block
SRAM cells. The DMAC can make use of this feature when the source and
destination interface have a different width to up-size/down-size the data
bus.
Using memory cells with asymmetric port width consumes the same amount of
SRAM cells, but allows to bypass the re-size blocks inside the DMAC that
are otherwise used for up- and down-sizing. This reduces overall resource
usage and can improve timing.
If the ratio between the destination and source port is too larger to be
handled by SRAM alone the SRAM block will be configured to do partial up-
or down-sizing and a resize block will be inserted to take care of the
remaining up-/down-sizing. E.g. if a 256-bit interface is connected to a
32-bit interface the SRAM will be used to do an initial resizing of 256 bit
to 64 bit and a resize block will be used to do the remaining resizing from
64 bit to 32 bit.
Currently this feature is disabled for Intel FPGAs since Quartus does not
properly infer a block RAM with different read and write port widths from
the current ad_asym_mem module. Once that has been resolved support for
asymmetric memories can also be enabled in the DMAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMA_LENGTH_ALIGN LSBs of all length For the most part the tools are
able to deduce this using constant propagation.
But this propagation does not work across the asynchronous meta data FIFO
in the burst memory module.
Add a DMA_LENGTH_ALIGN parameter to the burst_memory module which is used
to explicitly keep the LSBs of length registers on the destination side
fixed at 1'b1. This reduces resource use and improves timing by allowing
better constant propagation and unused logic elimination.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For consistent simulation behavior it is recommended to annotate all source
files with a timescale. Add it to those where it is currently missing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Length of partial transfers are stored in a queue for SW reads.
The presence of partial transfer is indicated by a status bit.
The reporting can be enabled by a control bit.
The progress of any transfer can be followed by a debug register.
Reduce the width of ID signals to avoid size mismatches in Arria 10 SoC
projects where the ID width of the hard IP is 4.
The width of ID that reaches the slave can be increased by the interconnect if
multiple masters access the slave so we end up with mismatches.
Since these signals are unused it is safe to reduce them to minimum width and
let the interconnect zero-extend them as required.
This change adds a diagnostic interface to the DMAC core.
The interface exposes internal information about the core,
information which can't be exposed through AXI registers
due the latency and update rate.
Such information is the fullness of the internal buffer.
For this is exposed in bursts and is driven from the destination
clock domain, as this is reflected in its name.
The signal has a fixed size and is dimensioned by
taking in account the supported maximum number of bursts of 128.
In its current implementation the DMAC requires that the length of a
transfer is aligned to the widest interface. E.g. if the widest interface
is 128 bits wide the length of the transfer needs to be a multiple of 16
bytes.
If the requested length is not aligned to the interface width it will be
rounded up.
This works fine as long as both interfaces have the same width. If they
have different widths it is possible that the length is rounded up to
different values on the source and destination side. In that case the DMA
will deadlock because the transfer lengths don't match and either not enough
of too much data is delivered from the source to the destination side.
Currently it is up to software to make sure that such an invalid
configuration is not possible.
Also enforce this requirement in the DMAC itself by setting the LSBs of the
transfer length to a fixed 1 so that the length is always aligned to the
widest interface.
Software can also use this to discover the length alignment requirement, by
first writing a zero to the length register and then reading the register
back. The LSBs of the read back value will be non-zero indicating the
alignment requirement.
In a similar way the stride needs to be aligned to the width of its
respective interface, so the generated addresses stay aligned. Enforce this
in the same way by keeping the LSBs cleared.
Increment the minor version number to reflect these changes.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
For the memory-mapped AXI read interface the slave asserts rlast for the
last beat in a burst.
This means we don't have to count the number of beats to know when the
burst is completed but instead can use rlast. This slightly reduces the
amount of resources needed for the MM-AXI source module and given that the
beat_counter is often the bottleneck timing wise this should also improve
the timing.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The DMAC allows a transfer to be aborted. When a transfer is aborted the
DMAC shuts down as fast as possible while still completing any pending
transactions as required by the protocol specifications of the port. E.g.
for AXI-MM this means to complete all outstanding bursts.
Once the DMAC has entered an idle state a special synchronization signal is
send to all modules. This synchronization signal instructs them to flush
the pipeline and remove any stale data and metadata associated with the
aborted transfer. Once all data has been flushed the DMAC enters the
shutdown state and is ready for the next transfer.
In addition each module has a reset that resets the modules state and is
used at system startup to bring them into a consistent state.
Re-work the shutdown process to instead of flushing the pipeline re-use the
startup reset signal also for shutdown.
To manage the reset signal generation introduce the reset manager module.
It contains a state machine that will assert the reset signals in the
correct order and for the appropriate duration in case of a transfer
shutdown.
The reset signal is asserted in all domains until it has been asserted for
at least 4 clock cycles in the slowest domain. This ensures that the reset
signal is not de-asserted in the faster domains before the slower domains
have had a chance to process the reset signal.
In addition the reset signal is de-asserted in the opposite direction of
the data flow. This ensures that the data sink is ready to receive data
before the data source can start sending data. This simplifies the internal
handshaking.
This approach has multiple advantages.
* Issuing a reset and removing all state takes less time than
explicitly flushing one sample per clock cycle at a time.
* It simplifies the logic in the faster clock domains at the expense of
more complicated logic in the slower control clock domain. This allows
for higher fMax on the data paths.
* Less signals to synchronize from the control domain to the data domains
The implementation of the pause mode has also slightly changed. Pause is
now a simple disable of the data domains. When the transfer is resumed
after a pause the data domains are re-enabled and continue at their
previous state.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the transfer logic, including the 2d module, into its own sub-module.
This allows testing of the full transfer logic independently of the
register map logic.
The top-level module now only instantiates the register map and transfer
module, but does not have any logic on its own.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
A larger store-and-forward memory provides better protection against worst
case memory interface latencies by being able to store more data before
over-/underflowing.
Based on empirical testing it was found that using a size of 4 bursts can
still result in underflows/overflows under certain conditions. These do not
happen when using a size of 8 bursts.
This change does not significantly increase resource consumption. Both on
Intel and Xilinx the block RAM has a minimum depth of 512 entries. With a
default burst length of 16 beats that allows for up to 32 bursts without
requiring additional block RAM.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Commit e6aacd2f56 ("axi_dmac: Better support debug IDs when ID_WIDTH !=
3") managed to get the order of the IDs in the debug register wrong.
Restore the original order.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Split the register map code into a separate sub-module instead of having it
as part of the top-level axi_dmac.v file.
This makes it easier to component test the register map behavior
independently from the DMA transfer logic.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The MAX_BYTES_PER_BURST option allows to configure the maximum bytes that
are part of a burst. This can be an arbitrary value.
At the same time there is a limit of how many bytes can be supported by the
memory buses. A AXI3 interface supports a maximum of 16 beats per burst
and a AXI4 interface supports a maximum of 256 beats per burst.
At the moment the it is possible to specify a MAX_BYTES_PER_BURST value
that exceeds what can be supported by the AXI memory-mapped bus. If that is
the case undefined behavior will occur and the DMAC will function
incorrectly.
To avoid this make sure that the MAX_BYTES_PER_BURST value does not exceed
the maximum that can be supported by the interfaces.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The width of the AXI burst length field depends on the AXI standard
version. For AXI3 the width is 4 bits allowing a maximum burst length of 16
beats, for AXI4 it is 8 bits wide allowing a maximum burst length of 256
beats.
At the moment the width of the length signals are determined by type of the
source AXI interface, even if the source interface type is not AXI. This
means if the source interface is set to AXI3 and the destination interface
is set to AXI4 the internal width of the signal for all interfaces will be
4 bits. This leads to a truncation of the destination bus length field,
which is supposed to be 8 bits.
If burst are generated that are longer than 16 beats the upper bits of the
length signal will be truncated. The result of this will be that the
external AXI slave interface (e.g. the DDR memory) and the internal logic
in the DMA disagree about burst length. The DMA will eventually lock up
when its internal buffers are full.
To avoid this issue have different configuration parameters for the source
and destination interface that configure the AXI bus length field width.
This way one of the interfaces can be configured for AXI3 and the other for
AXI4 without interfering with each other.
Fixes: commit 495d2f3056 ("axi_dmac: Propagate awlen/arlen width through the core")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Exposed AXI3 interface on the Intel version of the IP for UI and feature consistency.
Some of the signals that are defined as optional in the AMBA standard
are marked as mandatory in Qsys in case of AXI3. Because of this such signals
were added to the interface of the DMAC and driven with default values.
For Xilinx in order to keep existing behavior the newly added signals
are hidden from the interface.
New parameters are added to define the width of the AXI transaction IDs;
these are hidden from the UI; We can add them to the UI if the fixed size
of the IDs will cause port incompatibility issues.
The primary use-case of the DMA controller is in non-2D mode. Make this the
default, since allows projects to instantiate the controller with the
default configuration without having to explicitly disable 2D support.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Add some limit TLAST support for the streaming AXI source interface. An
asserted TLAST signal marks the end of a packet and the following data beat
is the first beat for the next packet.
Currently the DMAC does not support for completing a transfer before all
requested bytes have been transferred. So the way this limited TLAST
support is implemented is by filling the remainder of the buffer with 0x00.
While the DMAC is busy filling the buffer with zeros back-pressure is
asserted on the external streaming AXI interface by keeping TREADY
de-asserted.
The end of a buffer is marked by a transfer that has the last bit set in
the FLAGS control register.
In the future we might add support for transfer completion before all
requested bytes have been transferred.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The current layout of the debug ID register assumes that the ID_WIDTH is 3.
Change things so that the padding 0 width depends on the ID_WIDTH
parameter so that we end up with the same register layout regardless of the
value of ID_WIDTH.
Also split things into two registers, this allows for an ID_WIDTH up to 8
(which should hopefully be enough for all practical applications).
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The AXI specification that the minimum address space size is 4k, make sure
the axi_dmac adheres to this.
Internally the register space is still 2k. This means the upper and lower
2k of the axi4lite register space will map to the same internal registers.
Software must not rely on this and only access the lower 2k to enable
compatibility in case the internal space grows in the future.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Make sure that the right hand side expression of assignments is not wider
than the target signal. This avoids warnings about implicit truncations.
None of these changes affect the behaviour, just fixes some warnings about
implicit signal truncation.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
All the hdl (verilog and vhdl) source files were updated. If a file did not
have any license, it was added into it. Files, which were generated by
a tool (like Matlab) or were took over from other source (like opencores.org),
were unchanged.
New license looks as follows:
Copyright 2014 - 2017 (c) Analog Devices, Inc. All rights reserved.
Each core or library found in this collection may have its own licensing terms.
The user should keep this in in mind while exploring these cores.
Redistribution and use in source and binary forms,
with or without modification of this file, are permitted under the terms of either
(at the option of the user):
1. The GNU General Public License version 2 as published by the
Free Software Foundation, which can be found in the top level directory, or at:
https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
OR
2. An ADI specific BSD license as noted in the top level directory, or on-line at:
https://github.com/analogdevicesinc/hdl/blob/dev/LICENSE
up_rdata is qualified by the up_rack signal. There is no need to reset it
since by the time the signal is read the reset value has already been
overwritten anyway.
Also gate the up_rdata registers if no read operation is in progress. In
this case any changes would be ignored anyway.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The AXI DMAC peripheral only uses 11-bit of the register map interface
address. Reducing the signal width to this value allows the scripts to
correctly infer the size of the register map.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Currently the AXI address width of the DMA is always 32-bit. But not all
address spaces are so large that they require 32-bit to address all memory.
Extract the size of the address space that the DMA is connected too and
configure reduce the address size to the minimum required to address the
full address space.
This slightly reduces utilization.
If no mapped address space can be found the default of 32 bits is used for
the address.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The debug registers are useful during development but are rarely used in a
production design. Add a option that allows to disable them, this reduces
the resource utilization of the DMAC.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Depending on whether the core is configured for AXI4 or AXI3 mode the width
of the awlen/arlen signal is either 8 or 4 bit. At the moment this is only
considered in top-level module and all other modules use 8 bit internally.
This causes warnings about truncated signals in AXI3 mode, to resolve this
forward the width of the signal through the core.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Add a register to the AXI DMAC register map which functions has a
identification register. The register contains the unique value of "DMAC"
(0x444d4143) and allows software to identify whether the peripheral mapped
at a certain address is an axi_dmac peripheral.
This is useful for detecting cases where the specified address contains an
error or is incorrect.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The address width for the AXI-Lite configuration bus for the core is only
14 bit. Remove the upper unused bits from the public interface.
This allows infrastructure code to know about this and it might be able to
perform optimizations of the interconnect based on this.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the clock and reset signals of the m_axi_src interface next to the
other signals in the module definition.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This control signal can be overwritten by the up_axis_xlast/up_axis_xlast_en bits, in order to create a single stream, which is contains multiple streams.
This can be use to fill up the DACFIFO module.