The output data of the decimation block is 16-bit signed. Properly sign
extend the 12-bit input signal when the filter is bypassed.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The minimum number of bits required for the adders in a CIC filter depends
on the decimation rate. Higher decimation factors require more bits. This
means for a multirate filter the size of the logic structures is determined
by the highest supported rate.
The current implementation of the filter always uses all bits of the
structure to compute the results, that means even when running with the
lowest decimation factor all the bits that are required for the highest
decimation factor are used. This will work fine as additional bits do not
affect the output of the filter.
This patch implements dynamic partial gating of the filter structure based
on the selected decimation factor. Bits that are not required for a certain
rates are gated and the carry bits are masked from propagating through the
adder chain. This results in significant power savings at smaller
decimation factors.
This means that the filter itself is now using more power the higher the
decimation rate. But this is offset by the reduced data output rate running
subsequent processing stages at a lower rate and reducing power consumption
there. This results in a more or less flat power profile regardless of
decimation factor.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The minimum decimation rate of the CIC block is five, this means data
arrives at the FIR filter at most every five clock cycles. The decimation
rate of the filter is two so the filter produces an output at most every
ten clock cycles. This allows for ten clock cycles to compute the result.
The current implementation of the filter uses a fully pipelined
architecture with one multiplier for each coefficient. Which then do work
for one clock cycle and sit idle for the next nine clock cycles.
Rework the filter to be sequential reducing the number of required
multipliers to one. In addition exploit the symmetric structure of the
filter to make use of the preadder reducing the required multiply
operations by two.
This significantly reduces the logic utilization of the filter as well as
moderately reduces power consumption.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The minimum decimation of the CIC block is 5. This means new data arrives
at the comb stages at most every 5 clock cycles. Rather than letting the
logic sit idle during those 4 extra cycles use it to sequentially process
the comb stages of the filter. This reduces the logic utilization of the
filter by quite a bit.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
The output data mux is used to bypass the filter when it is not used. Which
setting is used for the mux depends on the 3-bit filter_mask signal.
Registering the control logic into a single bit signal reduces the amount
of routing resources required. Since changing the filter_mask settings is
asynchronous to the processing anyway the extra clock cycle delay
introduced by this change does not affect behaviour.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Re-implement the CIC using the basic building blocks from the util_cic
library.
This new implementation is structurally equivalent to the previous version,
but will be used as a platform for implementing changes that will improve
area and power consumtion of the filter
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Move the processing pipeline of the axi_adc_decimate core to its own
sub-module. This makes it easier to simulate the processing independent of
the register map.
Xilinx recommends that all synchronizer flip-flops have
their ASYNC_REG property set to true in order to preserve the
synchronizer cells through any logic optimization during synthesis
and implementation.