Refereed Conferences

“Performance tuning of matrix multiplication in OpenCL on different GPUs and CPUs,” K.Matsumoto, N. Nakasato, S. G. Sedukhin,
In Proceedings of the 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC 2012),,
IEEE CS’s Conference Publishing Service, Salt Lake City, UT, USA, Nov. 2012 (in press).

“Implementing a code generator for fast matrix multiplication, in OpenCL on the GPU,” K.Matsumoto, N. Nakasato, S. G. Sedukhin,
In Proceedings of IEEE 6th International Symposium on Embedded Multicore SoCs (MCSoC12),,
IEEE Computer Society Press, pp. 198204, AizuWakamatsu City, Japan, Sep. 2012.

Object Oriented Model of Generalized Matrix Multipication. Maria Ganzha, Stanislav Sedukhin, Marcin Paprzycki
FedCSIS 2011,
pp.439442.

“Generalizing Matrix Multiplication for Efficient Computations on Modern Computers,” Stanislav Sedukhin, Marcin Paprzycki,
In Proceedings of PPAM (1). ,,
2011, 225234.

Blocked AllPairs Shortest Paths Algorithm for Hybrid CPUGPU System. Kazuya Matsumoto, Naohito Nakasato, Stanislav G. Sedukhin,
In Proceeding of HPCC 2011,
Japan,
pp.145152.

Image scrambling based on a new linear transform.
Abhijeet A. Ravankar, Stanislav G. Sedukhin,
International Conference on Multimedia Technology, (ICMT),
Hangzhou, 2628 July 2011,
pp.3105  3108, DOI: 10.1109/ICMT.2011.6002034

An O(n) TimeComplexity Matrix Transpose on Torus Array Processor. Abhijeet A. Ravankar, Stanislav G. Sedukhin,
In Proceeding of The First International Conference on Networking and Computing, (ICNC),
Japan,
2011, pp.242247, IEEE Computer Society Press.

“MeshofTori”: A Novel Interconnection Network for Frontal Plane Cellular Processors, Abhijeet A. Ravankar, Stanislav G. Sedukhin,
In Proceeding of The First International Conference on Networking and Computing, (ICNC),
Higashi Hiroshima, Japan, November 1719, 2010, pp.281284, IEEE Computer Society Press.

Matrix Multiplyadd in minplus algebra on a shortvector SIMD processor of Cell/B.E., Kazuya Matsumoto, Stanislav G. Sedukhin,
In Proceeding of The First International Conference on Networking and Computing, (ICNC),
Higashi Hiroshima, Japan, November 1719, 2010, pp.272274, IEEE Computer Society Press.

"Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms", Stanislav G. Sedukhin, Ahmed S. Zekri, Toshiaki Myiazaki,
In Proceeding of the 2010 39th International Conference on Parallel Processing Workshops,
(ICPPW),
San Diego, CA, USA
September 1316, pp.127134,
2010.

Rapid * Closure: Algebraic Extensions of a Scalar Multiplyadd Operation, Stanislav G. Sedukhin and Toshiaki Miyazaki,
In Proceeding of the 25th ISCA International Conference on Computers and their Applications, (ISCA),
Honolulu, Hawaii, USA, pp.1924, March 2426, 2010.

"Matrix Inversion on the Cell/B.E. Processor," Shodai Yokoyama, Kazuya Matsumoto, Stanislav G. Sedukhin,
In Proceeding of the 11th IEEE International Conference on High Performance Computing and Communications (HPCC09),
Seoul, Korea, pp.148153, June, 2009.

Blocked Matrix Inversion on PlayStation 3, Shodai Yokoyama, Kazuya Matsumoto, Stanislav G. Sedukhin,
In Proceedings of SACSIS 2009
, Hiroshima,Japan, pp.175176, May 2009.

2D Separable Transforms on Matrix Processor, with A.S. Zekri,
Proc. of the International Conference on Computers and their Applications in Industry and Engineering (CAINE2008) , November
1214, 2008, Honolulu, Hawaii, USA, Intern. Society for Computers and their Applications (ISCA) pp. 106  111.

3D Toroidal Array Processor for Multidimensional DSP Transforms, with T. Miyazaki, K. Kuroda,
Proc. of the 2nd International 3D System Integration Conference (3DSIC 2008) ,May 12—13, 2008, Tokyo, Japan, ASET, pp. 401—410.

Array Processor Featuring an Effective FIFObased Data Stream Management, T. Miyazaki, Y. Nomoto, Y. Sato, and S. Sedukhin,
Proc. of the Proc. IEEE 8th International Conference on Computer and Information Technology (CIT2008) , pp. 255260 , Sydney, July 2008.

A 3D Array Processor Tuned to 3D DCT, Yuki Ikegaki, Naoto Takeishi, Toshiaki Miyazaki，Stanislav Sedukhin,
Proc. of the 電子情報通信学会 第３９回機能集積情報システム研究会(FIIS), FIIS09260, June 2009.

Computationally Efficient Parallel MatrixMatrix Multiplication on the Torus, with A.S. Zekri,
LNCS 4759, J. Labarta, K. Joe, T. Sato (Eds.), 2008, Springer, pp. 219 – 226.

Evaluating the Performance of Basic Linear Algebra Subroutines on a Torus Array Processor, with Ahmed S. Zekri
, Proc. of the 7th IEEE International Conference on Computer and Information Technology (CIT 2007) 16 19 Oct. 2007, AizuWakamatsu, Japan, pp. 300 – 305.

Transitive Closure on the PlayStation 3, with K. Matsumoto and D. Vazhenin,
Proc. of the 2nd International Workshop on Automatic Performance Tuning (iWAPT 2007).
Tokyo, Japan, September 2007

Performance Evaluation of Basic Linear Algebra Subroutines on a Matrix coprocessor, with A. Zekri,
Proc. of the PPAM 2007  The 7th International Conference on Parallel Processing and Applied Mathematics, LNCS
vol. 4967, Eds. R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, SpringerVerlag, LNCS, Vol. 4967, Gdansk, Poland Sept. 912, 2007, p. 1190—1199

Finegrained Matrix Multiplyadd on a Torus Array Processor, with Ahmed Zekri
, Proc. of the The 22nd International Conference on Computers and Their Applications (CATA2007)
Honolulu, Hawaii USA, March 2007, pp. 44 – 51.

Matrix Transpose on 2D Torus Array Processor, with Ahmed S. Zekri,
Proc. of the The Sixth IEEE International Conference on Computer and Information Technology (CIT'06) Korea, Sept. 2006, cit, p. 45.

The General Matrix Multiplyadd Operation on 2D Torus, with Ahmed S. Zekri,
Proc. of the 20th IEEE IPDPS Symposium, DSEC06 Workshop Rhodes Island, Greece, Apr. 2006.

A Matrix Processor for Mathintensive Applications
CAINE 2005: 109114

A Highly Efficient Implementation of Back Propagation Algorithm by Matrix ISA, with M. Soliman, S. Mohammed, S. Hassan,
Proc. of the 8th International Conference on Human and Computers (HC2005) Aug. Sept. 2005, AizuWakamatsu, Japan, pp. 312 – 319.

Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor, with A. Takahashi,
Proc. of the Lecture Notes in Computer Science, Springer, Vol. 3726, 2005, Proc. of the First International Symposium on High Performance Computing and Communications (HPCC 2005), L.T. Yang et al. (Eds.), Sorrento (ISEE 2003),
Italy, September 2005, pp. 786  795.

Computationally Efficient Parallel MatrixMatrix Multiplication on the Torus, with A. Zekri,
Proc. of the 6th International Symposium on HighPerformance Computing (ISHPCVI) Nara, Japan, September 2005.

A Technologyscalable Matrix Processor for Data Parallel Applications, with M. Soliman,
HPC Asia 2004,Omiya, Japan, July 20  22, 2004.

HighlyScalable Array Processor for Data Parallel Applications, with M.
Soliman, Proc. of the International Symposium on Information Science and
Electrical Engineering 2003 (ISEE 2003), Fukuoka, Japan, November 13 
14, 2003, pp. 167  170.
 Parallel
LUdecomposition on Pentium Streaming SIMD Extensions, with A.
Takahashi, M. Soliman, Lecture Notes in Computer Science, Springer,
Vol. 2858, 2003, Proc. The 5th International Symposium on High
Performance Computing (ISHPC 2003), TokyoOdaiba, Japan, October 20 
22, 2003, pp. 423  430.
 Trident:
TechnologyScalable Architecture for Data Parallel Applications, with M.
Soliman. Proc. The 17th Internationl Parallel & Distributed Processing
Symposium (IPDPS 2003), Nice, France, 22  26 April, 2003, IEEE Computer
Society Press, CDROM Edition
 Matrix
Bidiagonalization on the Trident Processor, with M. Soliman. Proc. The
17th Internationl Parallel & Distributed Processing Symposium (IPDPS
2003), Nice, France, 22  26 April, 2003, IEEE Computer Society Press,
CDROM Edition
 BLAS on
the Trident Processor: Implementation and Performance Evaluation, with
M. Soliman. Proceedings of the ISCA 18th International Conference on
Computers and their Applications (CATA2003), Honolulu, Hawaii USA,
March 26  28, 2003, ISCA Press, pp. 359  364.

Performance Analysis of SVD Algorithm on the Trident Processor, with M.
Soliman. Proc. The First International Symposium on Cyber Worlds: Theory
and Practices (CW 2002), Eds.: S. Peng, V. Savchenko, S. Yukita, Tokyo,
November 6  8, 2002, IEEE Computer Society Press, pp. 95  102.
 Design of
Array Processors for Multidimensional Image/Signal Processing, with S.
Peng, Proc. The Second International Conference on Neural, Parallel, and
Scientific Computations , Dynamic Publ., Atlanta, Georgia, August 710,
2002, pp. ???  ???.

Multilevel ISA Processor for Accelerating Data Parallel Applications,
with M. Soliman, Proceedings of the International Conference on Parallel
and Distributed Techniques and Applications (PDPTA'2002), CSREA Press,
Ed. H.R. Arabnia, Las Vegas, Nevada, USA (June 25June 28, 2002), pp. 1492 
1498.
 Trident:
A Scalable Architecture for Scalar, Vector and Matrix Operations, with
M. Soliman, Proceedings of the Seventh AsiaPacific Computer System
Architecture Conference, Monash University, Melbourne, Jan./Feb. 2002,
F.Lai and J.Morris, Eds., Published by the Australian Computer Society Inc.,
pp. 91  99.
 Pattern
Dependent Reconstruction of Raster Digital Elevation Models from Contour
Maps, with V. Savchenko, Proceedings of the IASTED International
Conference "Visualization, Imaging, and Image Processing (VIIP 2001)",
Marbella, Spain, Sept. 35, 2001, pp. 237  244.
 Multicast
Based Cluster Web Server, with T. Takigahira, Proceedings of the
International Conference on Advances in Infrastructure for Electronic
Business, Science, and Education on the Internet (SSGRR 2001), L'Aquila,
Italia, August 6 12, 2001, p. 74.
 Design of
Multidimensional DCT Array Processors for Video Applications, with S.
Peng, Proceedings of the 6th International EuroPar Conference,
Munich, Germany, August/September 2000 (EuroPar 2000), Eds. A. Bode, Th.
Ludwig, W. Karl, R. Wismuller, Lecture Noted in Computer Science, Vol. 1900,
Springer, pp. 1086  1094.

Performance Evaluation of the Clustered Web Server, with T. Takigahira,
Proceedings of the International Conference on Parallel and Distributed
Techniques and Applications (PDPTA'2000), CSREA Press, Ed. H.R. Arabnia,
Las Vegas, Nevada, USA (June 26June 29, 2000).
 Design of
Efficient Array Processors for Multidimensional Image Transforms, with
S. Peng, Proceedings of the IASTED International Conference on Parallel
and Distributed Computing and Systems (PDCS'99), IASTED/ACTA Press,
November 36, 1999, Cambridge, Massachusetts, USA, pp. 543  548.
 A New
Scalable Array Processor for Twodimensional Discrete Fourier Transform,
with S. Peng, H. Nagata, in the book: Parallel Computing: Fundamentals &
Applications, Proceedings of the International Conference ParCo99,
Delft, The Netherlands, 17  20 August 1999, Eds. E.H. D'Hollander, G.R.
Joubert, F.J. Peters, H.J. Sips, Imperial Collage Press, 2000, pp. 358 
365.
 Design of
I/O efficient, Scalable Array Processors for Multidimensional DFT, with
S. Peng, Proceedings of the International Conference on Parallel and
Distributed Techniques and Applications (PDPTA'99), CSREA Press, Ed.
H.R. Arabnia, Las Vegas, Nevada, USA (June 28July 1, 1999), pp. 1544 
1550.
 A Model
of Cluster of Computers, with S. Peng, Proceedings of the Second
IASTED International Conference on Parallel and Distributed Computing and
Networks (PDCN'98), Eds. G. Gupta, P. Pritchard, H. Shen, IASTED Publ.,
Brisbane, Queensland, Australia, Dec. 1416, 1998, pp. 154  159.
 Synthesis
of SizeOptimal Toroidal Array Processor for Solving Linear System of
Equations, with T. Takigahira, Proceedings of the International
Conference on Parallel and Distributed Techniques and Applications
(PDPTA'98), CSREA Press, Ed. H.R. Arabnia, Las Vegas, Nevada, USA (July
13July 16, 1998), pp. 1229  1235.
 Array
Processors Based on Gaussian Fractionfree Method, with Peng S.,
Sedukhin I., Proceedings of the First JAERIKansai International Workshop
on Ultrashortpulse Ultrahighpower Lasers and Simulation for LaserPlasma
Interactions, July 14  18, 1997, Kyoto, Japan, JAERI, March 1998, pp.
112  117.
 An
Interactive Graphic Tool for Systematic Design and Analysis of VLSI Array
Processor, with I. Sedukhin, Proceedings of the International
Conference on Parallel and Distributed Techniques and Applications
(PDPTA'97), CSREA Press, Ed. H.R. Arabnia, Las Vegas, Nevada, USA (June
30July 3, 1997), pp. 41  49.
 Parallel
Algorithm and Architectures for Twostep Divisionfree Gaussian Elimination,
with S. Peng. The IEEE Third International Conference on Algorithms and
Architectures for Parallel Processing (ICA3PP'97), Melbourne, Australia,
Dec. 1997, World Scientific, pp. 489  502.

Householder Bidiagonalization on Parallel Computers with Dynamic Ring
Architecture, with Peng S., Sedukhin I., Proceedings of the Second
Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
(pAs'97), Aizu, Japan, IEEE Computer Society Press, 1997, pp. 182  191.

Systematic Array Processors Design for Fractionfree Algorithm, with S.
Peng and I. Sedukhin, Proceedings of the Intern. Computing Symposium
(ICS'96), International Conference on Algorithms, Kaohsiung, Taiwan, R.O.C.
(Dec. 1921 1996), pp. 35  42.
 Parallel
Rendering with the Network Linda System, with Sedukhin I.,
Proceedings of the International Conference on Parallel and Distributed
Techniques and Applications (PDPTA'96), CSREA Press, Sunnyvale,
California, USA (Aug. 911, 1996), CSREA Press, pp. 879889.
 Parallel
Algorithm and Architecture for Twostep Divisionfree Gaussian Elimination,
with S. Peng and I. Sedukhin, Proceedings of the Intern. Conference on
Applicationspecific Systems, Architectures and Processors (ASAP'96),
Chicago, USA (Aug, 1921 1996), IEEE Computer Society Press, pp. 183  192.
 Systolic
Algorithms/Architectures for Divisionfree Linear System Solving, with
S. Peng, Proceedings of the 1966 IEEE Intern. Conference on Algorithms
and Architectures for Parallel Processing (ICA3PP'96), Singapore, (June
1113, 1996), IEEE Computer Society Press, pp. 201  208.
 An
Algorithm and Array Processors for Solving the Systems of Linear Equations,
Proceedings of the International Conference on Parallel and Distributed
Techniques and Applications (PDPTA'95), CSREA Press, Athens, Georgia,
USA (Nov. 34, 1995), CSREA Press, pp. 307  316.
 Design of
Optimal Systolic Arrays for 2D Discrete Fourier Transform, with S. Peng
and I. Sedukhin, Proceedings of the International Symposium on Parallel
and Distributed Supercomputing (PDSC'95), Fukuoka, Japan (Sept. 2628,
1995), pp. 61  69.

Systematic Approach and Software Tool for Systolic Design, with I.
Sedukhin, Proceedings of Third Join International Conf. on Vector and
Parallel Processing. CONPAR 94VAPP VI. Eds. Buchberger, B. and Volkert,
J. Lecture Notes in Computer Science, 854, SpringerVerlag, 1994, pp. 172 
183.
 A New
Systolic Architecture for Prime Factor DFTalgorithm, Proceedings of
Fourth Grate Lake Symposium on VLSI. Notre Dame, IN, USA. IEEE Computer
Society Press. 1994, pp. 40  45.
 Software
Tool for Systolic Design, with I. Sedukhin, World Computer Congress.
Proceedings of the Workshop on CASE Tools for Parallel Systems Development.
Eds.: Jelly, I. and Gorton, I. Sept., 1994, pp.74 78.

HighPerformance Computing Systems of Combined Architecture, with Fet,
Ya.; Vazhenin, A., Proceedings International Conference ''Parallel
Computing Technologies'', Ed. N.Mirenkov, September 711, 1991,
Novosibirsk, Russia, World Scientific, pp. 246 257.

Organization of Systolic Computations on a Ring of Computers,
Proceedings of Fifth International Workshop on Parallel Processing by
Cellular Automata and Arrays  PARCELLA'90, Ed's G.Wolf, T.Legendi,
U.Schendel, Berlin, September 1721, 1990, AkademieVerlag, pp. 273  278.
 Systolic
Array Architecture for TwoDimensional Discrete Fourier Transform,
Proceedings of Joint International Conference on Vector and Parallel
Processing: CONPAR 90  VAPP IV, Ed. H.Burkhart, September 1013, 1990,
Zurich, Switzerland, Lecture Notes in Computer Science, 457, 1990,
SpringerVerlag, pp. 682  691.
 Systolic
Processor for TwoDimensional Fourier Transform, Proceedings
International Latvian Signal Processing Conference, 2, Riga, May 37,
1990, pp. 123  128.
 The
Organization of Systolic Processing on a Ring of Computers,
Proceedings of the First AllUnion Conference "Homogeneous Computing Systems
and Systolic Structures", Lviv, USSR. The Institute of Applied Mechanics
and Mathematics of the Ukrainian Academy of Sciences. 1990, pp. 37  42.
 Systolic
Processor for Twodimensional DFT, Proceedings of First World
Conference on Parallel Computing in Engineering and Engineering Education,
UNESCO, Paris, France. 1990, pp. 299  303.
 An
Automated Procedure for Synthesis of Systolic/wafefront Arrays, with
Trishina E.V., The British Computer Society Workshop Series, Cambridge
University Press, CONPAR 88, International Conference on Drawing
Together the Threads of Parallelism in Research and Practice, 12  16 Sep.,
1988, Manchester, GB, pp. 735  742.
 The
Problemoriented Homogeneous Parallel Processors for DFT, with Semashko
A., Demidov A. Proceedings of the AllUnion Conference "Methods and
Microprocessors for the Digital Signal Processing", Riga, USSR. 1989,
pp. 18  22.
 The
Systolic Array Processor for the Fast Fourier Transform, Proceedings
of the AllUnion Conference "Methods and Microprocessors for the Digital
Transformations and Signal Processing", Riga, USSR. 1989, pp. 125  128.
 The
Design of Highlyparallel Algorithms and Architectures for Digital Signal
Processing, Proceedings of the VII AllUnion Conference "Distributed
Data Processing", Lviv, USSR. 1989, pp. 32  37.
 Design
Highlyparallel Algorithms and Processors for Digital Signal Processing,
Proceedings VII AllUnion Conf. on the Distributed Data Processing,
Lviv, USSR. 1989, pp. 63  67.
 The
Interactive CAD for Systolic Array Processors, with E. Trishina,
Proceedings of the VII AllUnion Conference "Parallel Programming and
Highperformance Computers", Kiev, USSR. 1988, pp. 124  129.
 The
Design of Optimal Systolic Array Processors, Proceedings II AllUnion
Conference "Pipeline Computing Systems", Kiev, USSR. 1988, pp. 15  21.
 The
Design of Systolic/Wavefront Array Processors, Proceedings of
AllUnion Conference on the Logic Methods of Design Homogeneous and Systolic
Processors. Moscow. USSR. 1988, pp. 135  140.
 The
Architecture of Array Processors for Convolution and Deconvolution, with
Jakush V., Proceedings AllUnion Conference "Software for Multiprocessor
Systems". Kalinin, USSR. 1988, pp. 78  82.
 The
Design of Parallel Algorithms and Architectures for Solving the Graphs
Problems, Proceedings IV AllUnion Conference "Distributed Data
Processing", Lviv, USSR. 1987, pp. 10  15.
 Highly
parallel algorithms and the architecture of a computer system for solving
large matrix problems, Artifical intelligence and informationcontrol
systems of robots, Proc. 3rd Int. Conf., 1984, Smolenice, Czech, pp. 319
 323.
 The
Parallel Algorithms for Solving the Mathematical Physics Problems,
Proceedings of the XXIVth Regional Conference, Novosibirsk. USSR. 1981,
pp. 67  71.
 The
Operating System SUMMA, with Kashun I., Proceedings International
Conference "The Problems of Design and Applying of the Discrete Systems".
Minsk, USSR. 1977, pp. 125  130.
 The
Investigation of Decentralized Communications Among Processing Elements of a
Homogeneous Computing System, with Vorob'ev V., Kashun I.,
Proceedings IV AllUnion Conference on Homogeneous Computing Systems,
Vol.1, "Naukova Dumka" Publisher, Kiev, USSR. 1975, pp. 36  38.
 The
Organization of Communications in the Homogeneous Computing System SUMMA,
Proceedings IV AllUnion Conference on Homogeneous Computing Systems,
Vol.1, "Naukova Dumka" Publisher, Kiev, USSR. 1975, pp. 63  65.
 The Model
of Homogeneous Computing System, with Kashun I., Proceedings IV
AllUnion Conference on Homogeneous Computing Systems, Vol.1, "Naukova
Dumka" Publisher, Kiev, USSR. 1975, pp. 120  122.
 System
Device for the Homogeneous Computing System SUMMA, with Afanas'ev V.,
Il'in M., Shum L., Proceedings IV AllUnion Conference on Homogeneous
Computing Systems, Vol.1, "Naukova Dumka" Publisher, Kiev, USSR. 1975,
pp. 47  50.