Exploring Processor Parallelism: Estimation Methods and Optimization Strategies

Jordans, R; Corvino, R; Jóźwiak, L.; Corporaal, H

Artykuł - szczegóły

Tytuł artykułu

Exploring Processor Parallelism: Estimation Methods and Optimization Strategies

Autorzy

Jordans R , Corvino R , Jóźwiak L. , Corporaal H

Treść / Zawartość

Pełne teksty:

Jordans_Corvino_Jozwiak_Corporaal_Exploring_2_2013.pdf

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Automatic optimization of application-specific instruction-set processor (ASIP) architectures mostly focuses on the internal memory hierarchy design, or the extension of reduced instruction-set architectures with complex custom operations. This paper focuses on very long instruction word (VLIW) architectures and, more specifically, on automating the selection of an application specific VLIW issue-width. The issue- width selection strongly influences all the important processor properties (e.g. processing speed, silicon area, and power consumption). Therefore, an accurate and efficient issue-width estimation and optimization are some of the most important aspects of VLIW ASIP design. In this paper, we first compare different methods for the estimation of required the issue-width, and subsequently introduce a new force-based parallelism estimation method which is capable of estimating the required issue-width with only 3% error on average. Furthermore, we present and compare two techniques for estimating the required issue-width of software pipelined loop kernels and show that a simple utilization-based measure provides an error margin of less than 1% on average.

Słowa kluczowe

design automation parallelism estimation very long instruction word VLIW

automatyzacja projektowania very long instruction word VLIW

Wydawca

Lodz University of Technology. Department of Microelectronics and Computer Science

Czasopismo

International Journal of Microelectronics and Computer Science

Rocznik

2013

Tom

Vol. 4, nr 2

Strony

55--64

Opis fizyczny

Bibliogr. 32 poz.

Twórcy

autor

Jordans R

Electronic Systems group at the Faculty of Electrical Engineering, Eindhoven University of Technology, The Netherlands

autor

Corvino R

Electronic Systems group at the Faculty of Electrical Engineering, Eindhoven University of Technology, The Netherlands

autor

Jóźwiak L.

Electronic Systems group at the Faculty of Electrical Engineering, Eindhoven University of Technology, The Netherlands

autor

Corporaal H

Electronic Systems group at the Faculty of Electrical Engineering, Eindhoven University of Technology, The Netherlands

Bibliografia

[1] ASAM, “Project website.” [Online]. Available: http://www.asam- project.org
[2] Synopsys, “Synopsys Processor Designer.” [Online]. Available: http://www.synopsys.com
[3] Target, “Target Compiler Technologies: IP Designer.” [Online]. Available: http://www.retarget.com/
[4] FlexASP project, “TTA-based co-design environment.” [Online]. Available: http://tce.cs.tut.fi/
[5] S. Aditya, B. Rau, and V. Kathail, “Automatic architecture synthesis and compiler retargeting for VLIW and EPIC processors,” in ISSS 1999 — 12th International Symposium on System Synthesis . IEEE, November 1999, pp. 107–113.
[6] V. Kathail, S. Aditya, R. Schreiber, B. Ramakrishna Rau, D. Cronquist, and M. Sivaraman, “PICO: automatically designing custom computers,” IEEE Computer , vol. 35, no. 9, pp. 39–47, September 2002.
[7] H. Corporaal and J. Hoogerbrugge, “Cosynthesis with the MOVE frame-work,” in CESA 1996 — Multiconference on Computational Engineering in Systems Applications — Symposium on Modeling, Analysis, and Simulation . IEEE, July 1996, pp. 184–189.
[8] L. Pozzi, K. Atasu, and P. Ienne, “Exact and approximate algorithms for the extension of embedded processor instruction sets,” IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems , vol. 25, no. 7, pp. 1209–1229, July 2006.
[9] C. Wolinski and K. Kuchcinski, “Automatic selection of application- specific reconfigurable processor extensions,” in DATE 2008 — Design, Automation & Test in Europe Conference & Exhibition . IEEE, March 2008, pp. 1214–1219.
[10] J. Matai, J. Oberg, A. Irturk, T. Kim, and R. Kastner, “Trimmed VLIW: Moving application specific processors towards high level synthesis,” in ESLsyn 2012 — The Electronic System Level Synthesis Conference . IEEE, June 2012, pp. 11–16.
[11] A. Irturk, J. Matai, J. Oberg, J. Su, and R. Kastner, “Simulate and eliminate: A top-to-bottom design methodology for automatic generation of application specific architectures,” IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems , vol. 30, no. 8, pp. 1173–1183, August 2011.
[12] P. Qiao, “Design and optimization of digital hearing aid system based on Silicon Hive technology,” Master’s thesis, Eindhoven University of Technology, Eindhoven, The Netherlands, August 2010. [Online]. Available: http://alexandria.tue.nl/repository/books/709025.pdf
[13] P. Qiao, H. Corporaal, and M. Lindwer, “A 0.964 mW digital hearing aid system,” in DATE 2011 — Design, Automation & Test in Europe Conference & Exhibition . IEEE, March 2011, pp. 1–4.
[14] Y. Okmen, “SIMD floating point processor and efficient implementation of ray tracing algorithm,” Master’s thesis, TU Delft, Delft, The Netherlands, October 2011. [Online]. Available: http://repository.tudelft.nl/assets/uuid:b0a8ae03-18b9-4a0e-9761-64ffd2851074/YunusOkmenMScThesis.pdf
[15] E. Diken, R. Jordans, R. Corvino, and L. Jóźwiak, “Application analysis driven ASIP-based system synthesis for ECG,” in Embedded World Conference, February 2012, pp. 1–8.
[16] G. S. Tjaden and M. J. Flynn, “Detection and parallel execution of independent instructions,” IEEE Transactions on Computers , vol. 19, no. 10, pp. 889–895, October 1970.
[17] D. W. Wall, “Limits of instruction-level parallelism,” in ASPLOS 1991 — 4th International Conference on Architectural Support for Programming Languages and Operating Systems . ACM, April 1991, pp. 176–188.
[18] T. M. Austin and G. S. Sohi, “Dynamic dependency analysis of ordinary programs,” in ISCA 1992 — 19th annual International Symposium on Computer Architecture . ACM, May 1992, pp. 342–351.
[19] K. B. Theobald, G. R. Gao, and L. J. Hendren, “On the limits of program parallelism and its smoothability,” in MICRO 1992 — 25th Annual International Symposium on Microarchitecture . ACM, December 1992, pp. 10–19.
[20] V. C. Cabezas and P. Stanley-Marbell, “Parallelism and data movement characterization of contemporary application classes,” in SPAA 2011 — 23rd Symposium on Parallelism in Algorithms and Architectures . ACM, June 2011, pp. 95–104.
[21] R. Jordans, R. Corvino, and L. Jóźwiak, “Algorithm parallelism estimation for constraining instruction-set synthesis for VLIW processors,” in DSD 2012 - 15th Euromicro Conference on Digital System Design . IEEE, September 2012, pp. 152–155.
[22] R. Jordans, R. Corvino, L. Jóźwiak, and H. Corporaal, “Exploring processor parallelism: Estimation methods and optimization strategies,” in DDECS 2013 - 16th Symposium on Design and Diagnostics of Electronic Circuits and Systems . IEEE, April 2013, pp. 18–23.
[23] M. Lam, “Software pipelining: An effective scheduling technique for VLIW machines,” ACM SIGPLAN Notices , vol. 23, no. 7, pp. 318–328, July 1988.
[24] B. R. Rau, “Iterative modulo scheduling: An algorithm for software pipelining loops,” in MICRO 1994 — 27th Annual International Symposium on Microarchitecture . ACM, December 1994, pp. 63–74.
[25] S. Carr, C. Ding, and P. Sweany, “Improving software pipelining with unroll-and-jam,” in HICSS 1996 — 29th Hawaii International Conference on System Sciences . IEEE, January 1996, pp. 183–192.
[26] E. M. Riseman and C. C. Foster, “The inhibition of potential parallelism by conditional jumps,” IEEE Transactions on Computers , vol. 21, no. 12, pp. 1405–1411, December 1972.
[27] P. Paulin and J. Knight, “Force-directed scheduling for the behavioral synthesis of asics,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 8, no. 6, pp. 661–679, June 1989.
[28] LLVM, “Project website.” [Online]. Available: http://www.llvm.org
[29] The R project for statistical computing, “Project website.” [Online]. Available: http://www.r-project.org/
[30] M. Smotherman, S. Krishnamurthy, P. S. Aravind, and D. Hunnicutt, “Efficient DAG construction and heuristic calculation for instruction scheduling,” in MICRO 1991 — 24th Anual International Symposium on Microarchitecture . ACM, November 1991, pp. 93–102.
[31] A. M. Malik, J. McInnes, and P. van Beek, “Optimal basic block instruction scheduling for multiple-issue processors using constraint programming,” International Journal on Artificial Inteligence Tools , vol. 17, no. 1, pp. 37–54, February 2008.
[32] L.-N. Pouchet, “Polybench/C 3.2,” 2013. [Online]. Available: http://www.cse.ohio-state.edu/pouchet/software/polybench/

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-76aa3d11-68cd-40cf-8c64-19289bef1b63