Commit Graph

3927 Commits (3f92381bd0a9aa7f8955470e27ac8348161e96a1)

Author SHA1 Message Date
Lars-Peter Clausen 5f83e20d33 m2k: standalone: Rework PS7 clocking
At the moment the PS7 is using three PLLs to generate its clocking tree.
One for the DDR, one for the ARM and one for the IO. This allows to run all
components at their respective maximum clock and extract maximum
performance from all components.

With some slight modifications it is possible to trade maximum performance
for a reduction in power consumption by using the same PLL for all three
sets of components and disabling the other two PLLs.

The CPU is now running at 500MHz rather than 666MHz and the DDR memory at
500MHz rather than 533MHz. This reduces power consumption by ~125mW.

This is OK since neither of them is a bottleneck for overall system
performance.

In addition software will downclock the CPU to 250MHz when full performance
is not required.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 3b748d8252 m2k: standalone: Disable 200MHz clock
The 200 MHz clock was only used as the IODELAY controller clock. Since the
design does not use any IODELAYs anymore this clock can be removed.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 63dfec25c1 m2k: standalone: Disable 2nd PS7 reset port
This reset is not used in the design, so remove it.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Adrian Costina 189e9194c8 m2k: Standalone, enabled power optimization 2017-04-18 12:17:40 +02:00
Adrian Costina 8b76c34dea scripts: Created ADI_POWER_OPTIMIZATION parameter for enabling power optimizations in the implementation stage 2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 508a783f39 axi_dac_interpolate: Register output mux signal
The output data mux is used to bypass the filter when it is not used. Which
setting is used for the mux depends on the 3-bit filter_mask signal.
Registering the control logic into a single bit signal reduces the amount
of routing resources required. Since changing the filter_mask settings is
asynchronous to the processing anyway the extra clock cycle delay
introduced by this change does not affect behaviour.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 834fcb7e27 axi_dac_interpolate: Reduce filter_mask signal width
Only the lower 3 bits of the filter_mask signal are used, no need to keep
the other bits around.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 9c2c50728c axi_dac_interpolate: Move processing pipeline to own sub-module
Move the processing pipeline of the axi_adc_decimate core to its own
sub-module. This makes it easier to simulate the processing independent of
the register map.

Also since the filter is two instances of the same logic, one for each
channel, let the new sub-module model one channel and instantiate it twice.
This allows to change the implementation without having to change the same
code twice.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen a02a763139 axi_adc_decimate: Do proper sign extension in bypass mode
The output data of the decimation block is 16-bit signed. Properly sign
extend the 12-bit input signal when the filter is bypassed.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 19ca0b3073 axi_adc_decimate: Gate unused filter parts
The minimum number of bits required for the adders in a CIC filter depends
on the decimation rate. Higher decimation factors require more bits. This
means for a multirate filter the size of the logic structures is determined
by the highest supported rate.

The current implementation of the filter always uses all bits of the
structure to compute the results, that means even when running with the
lowest decimation factor all the bits that are required for the highest
decimation factor are used. This will work fine as additional bits do not
affect the output of the filter.

This patch implements dynamic partial gating of the filter structure based
on the selected decimation factor. Bits that are not required for a certain
rates are gated and the carry bits are masked from propagating through the
adder chain. This results in significant power savings at smaller
decimation factors.

This means that the filter itself is now using more power the higher the
decimation rate. But this is offset by the reduced data output rate running
subsequent processing stages at a lower rate and reducing power consumption
there. This results in a more or less flat power profile regardless of
decimation factor.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 17dff9ce90 util_cic: Allow partial gating of CIC comb and int stages
Allow to split a CIC int or comb block into multiple stages and be able to
dynamically gate some of the stages. Also prevent carry propagation in
gated stages to keep the adder output constant.

This is useful for multi-rate filter where not all bits are needed all the
time.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 3e7325b29a axi_adc_decimate: Re-implemented FIR filter
The minimum decimation rate of the CIC block is five, this means data
arrives at the FIR filter at most every five clock cycles. The decimation
rate of the filter is two so the filter produces an output at most every
ten clock cycles. This allows for ten clock cycles to compute the result.

The current implementation of the filter uses a fully pipelined
architecture with one multiplier for each coefficient. Which then do work
for one clock cycle and sit idle for the next nine clock cycles.

Rework the filter to be sequential reducing the number of required
multipliers to one. In addition exploit the symmetric structure of the
filter to make use of the preadder reducing the required multiply
operations by two.

This significantly reduces the logic utilization of the filter as well as
moderately reduces power consumption.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 737418a1b0 axi_adc_decimate: Use sequential processing for CIC comb stage
The minimum decimation of the CIC block is 5. This means new data arrives
at the comb stages at most every 5 clock cycles. Rather than letting the
logic sit idle during those 4 extra cycles use it to sequentially process
the comb stages of the filter. This reduces the logic utilization of the
filter by quite a bit.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen a64e94a109 axi_adc_decimate: Register output mux control signal
The output data mux is used to bypass the filter when it is not used. Which
setting is used for the mux depends on the 3-bit filter_mask signal.
Registering the control logic into a single bit signal reduces the amount
of routing resources required. Since changing the filter_mask settings is
asynchronous to the processing anyway the extra clock cycle delay
introduced by this change does not affect behaviour.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 00c215ae69 axi_adc_decimate: Re-implement CIC filter
Re-implement the CIC using the basic building blocks from the util_cic
library.

This new implementation is structurally equivalent to the previous version,
but will be used as a platform for implementing changes that will improve
area and power consumtion of the filter

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 450c5ac74a Add CIC filter helper module
Add a helper module that provides the building blocks of a CIC filter.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen f1edcb02ac axi_adc_decimate: Reduce filter_mask register size
Only the 3 lower bits of the filter_mask register are used. No need to keep the other bits around.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:40 +02:00
Lars-Peter Clausen 927508404e axi_adc_decimate: Move processing pipeline to own sub-module
Move the processing pipeline of the axi_adc_decimate core to its own
sub-module. This makes it easier to simulate the processing independent of
the register map.
2017-04-18 12:17:40 +02:00
Adrian Costina 8ba86cb75c axi_logic_analyzer: Allow changing data pins direction to output only after data is available from the DMA or if the output is set from a register for that specific pin 2017-04-18 12:17:40 +02:00
Adrian Costina 07e52b4566 m2k: Connect logic_analyzer path to clk_out instead of clk
- this allows for the clock switching to be done inside axi_logic_analyzer core
2017-04-18 12:17:39 +02:00
Adrian Costina 8476d9d59a axi_logic_analyzer: Allow only data[0] to be used as alternative clock.
- drive all logic on clk_out instead of clk
2017-04-18 12:17:39 +02:00
Adrian Costina 3c13aa49eb axi_ad9963: Changed TX path from serdes to ddr.
- remove delay control related logic
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 610cc3affa m2k: standalone: Disable DMA debug registers
The debug register logic for the DMA take up a fair amount of resources.
Disabling them frees up space in the FPGA and also helps a bit with power.
Since those registers are mainly useful in development and not so much in
production the change shouldn't have any visible external effects.

It is possible to re-enable the debug registers by setting DEBUG_BUILD=1.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 77b453ac0d axi_dmac: Make debug register optional
The debug registers are useful during development but are rarely used in a
production design. Add a option that allows to disable them, this reduces
the resource utilization of the DMAC.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen c9f863f189 m2k: standalone: Set static switching activity for the reset signals
The global reset signals are only asserted for a short moment during system
startup and deasserted during normal operation, which is the case we care
about for power analysis. Giving them a static switching probability
indicating that they are always de-asserted will yield better results for
power analysis.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen b206f8f37a m2k: Disable AD9963 core RX datapath again
The RX datapath has a lot of things (IQ correction, DC filter, ...) that
take up a lot of space which are all not really needed in this project. So
disable the RX datapath.

It was previously enabled because the ad9963 core did not perform
sign-extension on the ADC data signal when the datapath was disabled. But
this has now been addressed.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 547dc04857 axi_ad9963: Sign extend ADC data when processing is bypassed
Match the behaviour of the processing data path and sign extend the output
data to 16-bit.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen a0e30a2211 util_axis_fifo: Improve clock gating of registers and BRAM
Currently the BRAM and data registers in the util_axis_data are ungated
when the FIFO is ready to receive data. This good for high-performance
since it reduces the number of control signals. But it is bad from a power
point of view since it causes additional reads and writes.

Change the core gate the BRAM and data register if either the consumer is
not ready to accept data or the producer has no data to offer.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 72cdd846b0 axi_ad9963: Allow to disable the IDELAYs on the ADC data path
Not all designs need the IDELAYs. Disabling them can reduce power consumption of the system.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 09ffe42603 ad_lvds_in: Allow to disable IDELAY
The IDELAY is not always required, but it eats up power when instantiated. Allow to disable it.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 45f87b46c2 ad_lvds_in: Use "SAME_EDGE" mode
Currently the IDDRs are configured in SAME_EDGE_PIPELINED mode, but then
the negative data is delayed by an additional clock cycle. This is the same
behaviour as using the IDDR in SAME_EDGE mode.

Switching to SAME_EDGE mode removes extra pipelining registers while
maintaining the same behaviour.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen c1ba57f808 m2k: Rework clocking domains
At the moment the register map fabric and DMA system memory side are
clocked by the 100MHz sys_cpu_clk. While this works fine that is a lot
faster than the clock has to run. There are only a few 100 register map
accesses per seconds at most and they are not on timing critical paths. The
penalty from clocking them at a lower rate is negligible for the overall
system performance.

The maximum clock rate for the DMAs is determined by the throughput
requirements. This is 200 Mbytes/s for the logic analyzer, pattern
generator and each of the DAC DMAs and 400 Mbytes/s for the ADC DMA.

The DMA datapath width is 64-bit so the required clock rates are 25MHz and
50MHz respectively. Some headroom is required to accommodate for occasional
bubble cycles on the data bus and the difference in reference clocks for
the converter and processing system.

The sys_cpu_clk is reduced to 27.8MHz which is fast enough for all but the
ADC DMA. For the ADC DMA a new clock domain running at 55.6 MHz is
introduced.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 2eaf931e07 m2k: Replace logic analyzer MMCM
The MMCM generating the logic analyzer clock unfortunately consumes a
disproportionately large amount of power compared to the rest of the
design.

Replace it by sourcing the logic analyzer clock from one of the Zynq FCLKs.
The IO PLL is running anyway so the power requirement is much lower.

For the time being this means we loose the ability to source the clock from
an external pin. But that feature is not supported by software at the
moment anyway. We'll bring it eventually when required.

This changes reduces power consumption by roughly 100mW.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen e616c8da0b m2k: Remove channel pack core for now
We always have both ADC channels enabled and the cpack core takes up a fair
amount of space, so remove it for now. Might come back later when we really
need it.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 0e2b47e517 axi_adc_trigger: Temporarily disable trigger reporting in register map
The current implementation doesn't quite work right when the interface
clock is slower than the trigger clock and also causes timing issues.
Disable it temporarily until a proper CDC transfer is implemented.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Adrian Costina efa2d0c006 m2k: Connected adc/dac resets to decimation/interpolation cores 2017-04-18 12:17:39 +02:00
Adrian Costina be6fa287fa axi_dac_interpolate: Make dac_reset external 2017-04-18 12:17:39 +02:00
Adrian Costina 7227e74444 axi_adc_decimate: Make adc_reset external 2017-04-18 12:17:39 +02:00
Adrian Costina 500112f79b m2k: Renamed l_clk to adc_clk and rst to adc_rst 2017-04-18 12:17:39 +02:00
Adrian Costina 094872619d axi_ad9963: Separated adc/dac clock and reset 2017-04-18 12:17:39 +02:00
Adrian Costina 6a49aefb6c m2k: Updated project to use new tx path with serdes 2017-04-18 12:17:39 +02:00
Adrian Costina 9f8fd5c922 axi_ad9963: updated tx path
- removed pll for power saving, added serdes circuitry instead
2017-04-18 12:17:39 +02:00
Adrian Costina fc7f2ef11b ad_serdes_out: allow selection between DDR/SDR configuration and output single ended data 2017-04-18 12:17:39 +02:00
Adrian Costina 166a4c53d5 ad_serdes_clk: allow for single ended clock input, made BUFR_DIVIDE configurable 2017-04-18 12:17:39 +02:00
Lars-Peter Clausen b58a5c37eb m2k: Reduce AXI interconnect utilization
Use the new axi_rd_wr_combiner module to ... the read and write DMA
interfaces into a single interface. This allows the AXI interconnect
completely optimize itself away and reduce the overall resource utilization
of the project.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 71469490c6 Add a helper module to combine a AXI read-only and a AXI write-only interface into a read-write interface
The read and write interfaces of a AXI bus are independent other than that
they use the same clock. Yet when connecting a single read-only and a
single write-only interface to a Xilinx AXI interconnect it instantiates
arbitration logic between the two interfaces. This is dead logic and
unnecessarily utilizes the FPGAs resources.

Introduce a new helper module that takes a read-only and a write-only AXI
interface and combines them into a single read-write interface. The only
restriction here is that all three interfaces need to use the same clock.

This module is useful for systems which feature a read DMA and a write DMA.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 3e0b337eae axi_ad9963: Remove extra pipeline stages on register read path
The register read logic is not that complicated that it needs two extra
pipeline stages. It can easily be condensed into a single combinatorial and
still meet timing with large margins.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 64dfa0432d axi_ad9963: Disable unused features of the register map
Disable registers in the register map which are not needed for this core.
This reduces the utilization of the core.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 957730c421 up_dac_common: Allow to disable GPIO registers
Not all peripherals use the GPIO register settings, but the registers still
take up a fair amount of space in the register map. Add options to allow to
disable them when not needed. This helps to reduce the utilization for
peripherals where these features are not needed.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:39 +02:00
Lars-Peter Clausen 0ae0da488b up_adc_common: Allow to disable GPIO and START_CODE registers
Not all peripherals use the GPIO and START_CODE register settings, but the
registers still take up a fair amount of space in the register map. Add
options to allow to disable them when not needed. This helps to reduce the
utilization for peripherals where these features are not needed.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
2017-04-18 12:17:38 +02:00