Until you get in contact with reality, you might think, that the implementation of some simple arithmetic is merely a task of rewriting it in another language (as in Java, Perl ...).
And - you aren't really wrong - you get a running solution.
The problem occurs, when it is synthesized in hardware and you find out that you can clock your completely combinatorical circuits with 2 MHz only...
That's the point where you start thinking about pipeline stages and registering of intermediate results.
And pipelined thinking is nothing that comes naturally to the human mind.
When you don't spend close attention to it, the simple task of summing up two signals becomes an exciting source of surprises: The place where you insert the pipeline register isn't the source of errors - it's the rest of the signals, you have to postpone accordingly, when you want to continue to work with the result.
Sounds strange - sounds uncalled-for - sounds familiar
- Placing flipflops "manually" sounds a bit like the first PONG implementation.
- Of course, there is always a more comfortable way: Mathlab and Simulink for example - there even seems to be some more open alternatives, by now.
- The problem with this cross-language-access might be in the less optimal resource usage and some missing configuration options (bitwidth, pixels/clock ...) that are hard to express in an abstract language (even though I regret to say, that I haven't given them a honest try).
- It's a bit like people using pointer arithmetics in C. They know, it might be dangerous and a pain in the ... on the other hand, it just is as fast as it can get - and sometimes that's what counts.
Okay. Here's the problem...
- Eventually you succeed in placing the logic and pipeline registers together and everything works.
- After some months, you just want to make a little addition to the circuit and bring in a register stage at some point.
- Now you have to check and adjust all the rest of the signals once more ... and after some months, the code might look quite new to you ;-)
- Therefore the little change gets a big change and bares the risk of something going wrong along its implementation - something hard to explain to "normal" programmers.
Staying in VHDL but trying to make one's life easier
- So the idea popped up to let the computer find out on its own how long the delay lines for the signals should be.
- Being quite paranoid about automatisms that you cannot check transparently, I considered a external program that parsed the code, filled in the registers on its own and writes the extended code back to the same file.
- The adjustments would be in commentary brackets - just like GUI code in the good old times of the MFC (/** DONT TOUCH THE FOLLOWING CODE. I - BILL GATES - HAVE FILLED IT IN **/) and the Visual Studio did some magic to it in the background...
- That way, the resulting file would be self-sufficient and would run with all other software (Simulation, SOPC-Generator, ...) and, when changes to the register stages were needed, they would also turn up in the source file.
Shrinking the solution, hoping to get it done
- Writing a full-fledged VHDL parser isn't that easy and so I already though about some code style rules, that would just have to be obeyed when you wanted support for the register stages.
- The connection where the N register stages might turn up could be marked using special postfixes like some_signal__ac (for auto-connect) where it should be used and the signal some_signal where it is declared and passed a value...
- The input signals and derived signals would have to be marked to fill them into groups that are synchronous to each other. And for synchronous signals that have entered an entity together, the parser can watch their delays ...
Shrinking the hopes
- The first candidate for this automatism was the one that started the whole thought.
- Therefore I didn't see it coming that after an hour of rewriting I voluntarily just quit.
- There were signals that would have been unnecessarily delayed (since they only change every millionth clock and _then_ in an uncritical time span) ... and you would surely get register stages for them even though they aren't needed at all.
- There were entities that brought in additional delays - which would have to be described in handish and formal parser commentaries, so the automatism knows what to do with this.
- There were edge-detections of signals (were you looked at the state of a signal now and a clock before) which broke the synchrony completely.
- You might place the edge-detection into a sub-entity (with a correspondent commentary for its delay) to circumvent the last problem.
- .... but I'm sure, there will be other problems that are as bad or worse...
Back to the roots - Back to "paper"
- Up above I wrote "Eventually you succeed in placing the logic...".
- Normally this involved a big sheet of paper were you sketched the signals and the flip flops to see, where you have missed one.
- After some months this sheet of paper is normally lost and you have to start anew
- To make life at least a little bit better, I switched to Graphviz for the sketches
- The sketch can be checked in alongside.
- You don't start to cry when you have to smush some more logic into the completed sketch.
- The signals that belong to the same clock cycle can be grouped using subgraph s
digraph G {
DELAY_0 [ label="Delay by 2 clocks" shape="rectangle" ];
DELAY_1 [ label="Delay by 2 clocks" shape="rectangle" ];
SubEntity_0 [ label="SubEntity with 2 clks delay" shape="rectangle" ];
SubEntity_1 [ label="SubEntity with 2 clks delay" shape="rectangle" ];
STATE_M_0 [ label="Statemachine registered" ];
subgraph cluster_0 {
style=filled;
color=lightgrey;
node [style=filled,color=white];
input_pixel;
input_pixel_padded;
input_ddr_content;
arithemetics1;
good_pixel_at_start;
label = "clock cycle 0";
}
subgraph cluster_1 {
style=filled;
color=lightgrey;
node [style=filled,color=white];
input_is_good_enough;
good;
bad;
label = "clock cycle 1";
}
subgraph cluster_2 {
style=filled;
color=lightgrey;
node [style=filled,color=white];
output_ddr_state;
input_ddr_content_reg2;
keep_ddr_content;
changed_pixel;
use_changed_4_ddr_content;
org_after_arithmetic;
label = "clock cycle 2";
}
subgraph cluster_4 {
style=filled;
color=lightgrey;
node [style=filled,color=white];
output_ddr_content;
result_pixel_14b;
result_pixel;
label = "clock cycle 4";
}
changed_pixel -> SubEntity_1;
org_after_arithmetic -> SubEntity_1;
SubEntity_1 -> result_pixel_14b;
result_pixel_14b -> result_pixel;
input_pixel -> input_pixel_padded;
input_ddr_content -> SubEntity_0;
input_pixel_padded -> SubEntity_0 ;
SubEntity_0 -> changed_pixel ;
input_ddr_content -> input_ddr_content_reg2 [ label="2xReg" ]
input_is_good_enough -> use_changed_4_ddr_content [ label="1xReg" ];
good -> STATE_M_0 ;
bad -> STATE_M_0 ;
input_ddr_content -> STATE_M_0 [ label="1xReg" ] ;
STATE_M_0 -> output_ddr_state;
STATE_M_0 -> keep_ddr_content;
use_changed_4_ddr_content -> output_ddr_content_comb;
keep_ddr_content -> output_ddr_content_comb;
output_ddr_state -> output_ddr_content_comb;
changed_pixel -> output_ddr_content_comb;
input_ddr_content_reg2 -> output_ddr_content_comb;
org_after_arithmetic -> output_ddr_content_comb;
output_ddr_content_comb -> DELAY_1;
DELAY_1 -> output_ddr_content;
input_ddr_content -> arithemetics1 ;
input_pixel_padded -> arithemetics1 ;
arithemetics1 -> good_pixel_at_start;
good_pixel_at_start -> input_is_good_enough [ label="1xReg" ];
input_is_good_enough -> good;
input_is_good_enough -> bad;
input_pixel_padded -> DELAY_0 ;
DELAY_0 -> org_after_arithmetic;
The Plea
.... when there is somebody out there who knows an easier way to do this, please tell me.


