This paper extends previous work by proposing a comprehensive framework for modeling and estimating the system-level power consumption for an embedded industrial parallel processor. The experimental results have demonstrated an average accuracy of 5% of the instruction-level estimation engine with respect to the RTL engine, with an average speed-up of four orders of magnitude.