Introduction:
The Computer Architecture Department at the University of Malaga (AC-UMA) is an academic and research group with particular experience in experimental and theoretical work in high performance computing; including the development of new parallel compilers, the design of VLSI circuits and the mapping of applications over clusters, multiprocessor machines and Grid-based environments; having also strong in-house computer support.
BitLAB (Bioinfomatics and Information Technologies Laboratory) is an internal group at AC-UMA focused in the application of advanced computing to solve management and data analysis in bioinformatics and biomedical fields; especially in Input/Output bounded applications. Bioinformatics, a data-driven science strongly featured by the high volume of heterogeneous data sets and the geographical dispersion of their services, is used as the benchmarking application domain in the research lines of Bitlab.
The BitLAB group has also trained staff to bridge data, medical and biological expertise into an integrated perspective to solve biomedical problems. Based on a strong collaborative environment between multi-disciplinary teams allow us to develop user-friendly platforms as a contribution in filling the gap between theoretical research and applied sciences
Keywords:
High performance and parallel computing (HPC); Grid technology; Integrated Bioinformatics (distributed data and services integration); Automatic knowledge discovering (KDD); Gene-expression data analysis; Training and e-learning spaces.
Main Research Lines:
The main research lines in BitLAB are: The catalogued scientific knowledge needed to produce a more complete view of any biological process is scattered across various external, heterogeneous and geographically distributed databases. Heterogeneity in formats and storage media, besides the diversity and dispersion of data, make difficult to use this plethora of interrelated information. The integration of these information sources for unified access, independently of possible internal changes, is a clear and important technological priority. In this research area we focus in the design and deplyment or integration software architectures, based on metadata repositories and programatic interfases to provide necessary functionalities, for example discovery, invocation and documentation of tools, data persistence systems, etc. The platform is able to access, link, combine and query biological data sets easily and efficiently by integrating a high number of disperse web-based services.
Related Clients The general goal of Knowledge Discovering from Databases (KDD) technologies is the extraction and abstraction of any type of pattern, perturbation, relationship or association from the analysed data. One of the most useful KDD outcomes are in the form of association rules that disclose hidden co-ocurrence of items as a set of antecedents and associated consequents. We have strongly applied KDD techniques in gene-expresion experiments (DNA arrays). The outputs from the reading devices, representing one experiment, are combined to conform a consolidated table with the expression values of genes (rows) in the different experiments (columns). The main component of this table is the gene expression-matrix. However, frequently each row also contains additional information about the particular properties of the gene: function, pathway, chromosome location, GO-terms, etc., that are called gene-metadata. At the same time, also the experiments or samples are described using different descriptors about the procedance, morphology, clinical information, etc. that are named sample-metadata. Related Clients ARco: Association rule collaborative tool: A high number of genomics projects are well underway producing large quantities of comparative genomic data leading a growth of the information management problems. In this context, experiment data management become a time-consuming and prone to error task due to the diversity of interrelated datatypes, high volume of data produced in geographically disperse laboratories and the needs for flexible and powerful access to query, combine, share, and protect the information. In this research area we have designed a generic, customisable and flexible genomics’ project management system, named pUMA (projects at the UMA) a web-based open-source application to support information management problems in genomic projects. pUMA has been used to manage several interdisciplinary ongoing projects, including the Spanish Solanaceae project (ESP-SOL), olive; riraaf; etc. Implementations Genomics has also challenged high-performance computing with a broad spectrum of hard demanding applications (CPU intensive, huge memory requirements, storage capacity and I/O bounded algorithms). We drive new solutions that seemed unaffordable only few years ago. Those strategies use parallel computers to solve computationally expensive algorithms whose computational patterns range from regular, such as database searching applications, to very irregularly structured patterns (phylogenetic trees). Fine- and coarse-grained parallel strategies have been addressed for these very diverse set of applications. Diverse computer architectures have also been studied, ranging from networks of commodity multi-computers and more powerful workstations to super-computers. Our parallel approaches include dynamic load distribution, speculative computation, network-bandwidth optimisation, intelligent task scheduling, as a way to avoid the most common sources of inefficiency in parallel computing. References * O.Trelles-Salazar, E.L.Zapata, J.M.Carazo; “On an efficient parallelization of exhaustive sequence comparison algorithms”; Computer Applications in BioSciences (1994) 10(5):509-511 * Ceron, C., Dopazo, J., Zapata, E.L., Carazo, J.M. and Trelles, O.; “Parallel Implementation for DNAml Program on Message-Passing Architectures”; Parallel Computing and Applications, vol 24 (5-6),June 98 pp.701-716 * Trelles O., Andrade M.A., Valencia A., Zapata E.L., and Carazo J.M.; “Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences”; Bioinformatics vol.14 no.5 1998 (pp.439-451); (formerly CABIOS) * Trelles O.; “On the parallelisation of bioinformatics applications”; Briefings in bioinformatics (May, 2001) vol.2 (2) pp. 181-194 * Trelles Oswaldo and Rodríguez Andrés; “Parallel Metaheuristics in Bioinformatics: A new class of algorithms”; Wiley Series on Parallel and Distributed Computing, Edited by: Enrique Alba, (ISBN-13-978-0-471-67806-9) pp: 517-549 In this area we apply automatic video analysis systems for semantic content recognition in living-organism’s videos. The system is able to extract from such videos metadata describing the actions and behavior of individual organisms (cells, bacteria, mice, etc). Starting with image processing procedures to identify the objects of interest, the system tracks the movements and actions of such organism in the video sequences, detecting and recording the relevant behavioral events. The metadata obtained, that describe the organism behavior present at the video content, are also organized in a database allowing a by-content accessing of the videos, obtaining important analytical information from the subsequent queries. The database search system enables the retrieving and visualization of selected video sequences matching a given query-by-content criteria. The efficiency and accuracy of the presented system increase analytical power for uncovering and studying the effect of drugs on a variety of behavioral models. Keywords: Computer vision, video analysis, object recognition, tracking, medicine. References * Rodriguez, A.; Shotton,D.; Guil, N. and Trelles, O.; “Automatic analysis of the content of cell biological video and database organization of their metadata descriptors”; IEEE Transactions in Multimedia (February 2004) Vol 6, No. 1, pages 119-128 * Rodríguez, A.; Shotton,D.; Guil, N. and Trelles, O.; “Analysis and Description of the Semantic Content of Cell Biological Videos”; Multimedia Tools and Applications, 25, 37-58, 2005 * Manuel J. Martín-Vázquez; Mario A. Trelles; Alejandro Sola; R. Glen Calderhead and Oswaldo We work in the developments of a controlled collaborative teaching environment tool specifically focused on emulating a real classroom environment. The main features this tool offers are: a) controlled access based on a central registry; b) distributed architecture that allows simultaneous online classrooms; c) centralized control of the students intercommunications; d) simulation of requests to speak ("raise hands") in the classroom; e) optimized bandwidth usage to reduce unneeded transmission, offering fluid communication between the participants; f) easy and intuitive interface, even for very young children; g) set of teaching resources such as blackboard and display of presentations. See: ODISEA: On-Line synchronous distance learning We Also maintain a virtual campus at: www.bitlab-es.com/e-learning Our group is focussed on computational methodologies for the analysis and interpretation of large-scale expression data sets generated by DNA micro-array experiments (see PreP and double-Scan www.bitlab-es.com/prep and engene www.bitlab-es.com/engenet software). Moreover, our group has a long tradition of interfacing with life-sciences teams to offer not only technological support for data-management but also data analysis support. Refs: tomato (ESP-SOL project), porcine (infection process in porcine by PCV2), strawberry (involved factors in strawberry ripening) and olive-tree (OLIGEN project). Image processing an intelligent systems are of essential importance in industrial and biomedical research, clinical diagnostic applications in dermatology and aesthetic surgery, and in assessing the results of that surgery. Particularly in clinical applications, the importance of an accurate and objective method of measuring skin or lesions state (e.g. coarseness, pore size, degree of wrinkling) and pigment changes, using measurable quantitative parameters, is very clear. The true value of skin or leion “quality” can only be realized in this manner, because visual inspection and grading is subjective and is affected by several factors, including: viewing geometry, ambient illumination, assessor’s experience and visual acuity, etc., and therefore lacks accuracy and the ability to record and reproduce the measurements consistently. The general approach in pur developments in this area is to build a sampling catalog based on both expert and automatic computer assessed tissue quality. By ‘tissue quality’ we mean mainly color and texture (coarseness, wrinkles, directionality, et cetera). Based on this catalog, differences in tissue quality can be assessed as a function of the differences in the catalog position of the different samples (i.e. before and after treatment). These differences can be used as an objective comparative measurement of the improvement obtained by treatment.
• Data and services integration (more info)
MOWServ at the National Institute for Bioinformatics : 
Advancing Clinico Genomics Trials (ACGT EU-project): 
jORCA: easily integration at: 
In this research area we develop new algorithms to extend the correlation beyond the expression values including both, genes and sample metadata.

Tomato project (Genoma-España): ![]()
Olive project (Genoma-España): ![]()
Reacciones Adversas a Alergenos y Fármacos: ![]()
See VideoCatalyzer: 
Here we will survey the long way walked in the bioinformatics domain as an important component in our computational perspective:
Early in the 90's we were working in a typical task in a computer architecture department: the development of parallel compilers focused in the optimisation of sparse matrices computations. Input/Output bounded applications are the converse type of processes, this mean software featured by a low computational load over large sets of data. We identify in bioinformatics –an emerging field in those days- a diverse set of applications sharing this particular feature: large and ever growing datasets, worldwide dispersed and heterogeneity, both in the data type (sequences, structures, etc) and in formats. We had found the mine vein we were looking for.Our initial research was centred on the parallelization of “regular” I/O applications in commodity multi-computers conformed by linking together simple PC's and more powerful workstations. Database searching fits quite well in this description and allow us to test our generic parallel approach that included dynamic load distribution [1], speculative computation in phylogenetic trees construction [2], network-bandwidth optimisation, intelligent task scheduling to minimize idle CPU power [3], this is to say, a model to avoid the most common sources of inefficiency in parallel computing. These developments provided us with an efficient parallel computational model to address more CPU demanding problems such as database searching in structural databases, and especially in the field of data-mining algorithms. A summary of the research conducted in this first epoch can be found in [4, 5].
So far, searching for homologies and evolutionary relationships between sequences is the most frequently used strategy for assigning function to new sequences. However, when working with query sequences, which have no clear homologues in the sequence databases, the functional annotation process is especially unattainable. For these situations we propose a strategy based on the identification of small fragments shared by the query and several database sequences [6], and use association rules [7] to correlate the fragments with biological annotations.
In the same field of data mining we start analysing biological video interested in the identification of basic animal behaviour. Image segmentation and tracking were used to characterize simple movements and using image understanding procedures identify more complex events (tumbling, stopped, moving, etc) [8,9]. As collateral results in this area we introduce in the field of medical imaging, in particular in the objective assessment of laser tissue treatments [10, 11]. Derived from these collaborations with medical teams we start the development of a numerical model to simulate the effect of laser radiation in tissue [12].
Coming back to bioinformatics our way evolved in the natural direction: towards gene-expression data processing, with particular focus in double-colour experiments with cDNA biological material. Error removal and data-quality [13] were in the focus of our research. Since we were involved in research projects all of our applications were implemented with the view in the final user. A new protected procedure for extending the dynamic range of gene expression acquired values removing quantization and saturation sources of error was an important contribution in the field [14].
Is not difficult to imagine the next movement towards complete the analysis loop squeezing these high quality gene-expression values: clustering and classification procedures [15]. To combine and complete the analysis of the gene-expression matrices we introduce the concept of association rules on gene-expression data combining expression ratios with biological annotations [16]. The latter need both sequence annotation and biological enrichment procedures.
At the beginning of the XXI century we were selected to conduct the development of the software integration architecture for the National Institute of Bioinformatics in Spain . In this work we address the design and implementation of an integration platform [17] and diverse versatile web clients [18] to access BioMOBY compatible services; a system by which a client can interact with multiple sources of biological data regardless of the underlying format or schema; using the service description stored in the BioMOBY catalogue.
Currently we are working in the paradigm of scientific workflows as crucial support in helping researchers use distributed and heterogeneous resources in a repeatable and well-defined way. Our workflow platform offers a way to execute workflows using the tools provided by the Spanish National Institute of Bioinformatics (INB). The platform is based on a flexible, lightweight architecture for publishing biological data and services, and is designed to use BioMOBY services, extending with persistency and privacy of the information retrieved for each user. It is also able to handle long-running services by means of the asynchronous BioMOBY specification.References.
[1] O.Trelles-Salazar, E.L.Zapata, J.M.Carazo; “On an efficient parallelization of exhaustive sequence comparison algorithms”; Computer Applications in BioSciences (1994) 10(5):509-511
[2] Ceron, C., Dopazo, J., Zapata, E.L., Carazo, J.M. and Trelles, O.; “Parallel Implementation for DNAml Program on Message-Passing Architectures”; Parallel Computing and Applications, vol 24 (5-6),June 98 pp.701-716
[3] Trelles O., Andrade M.A., Valencia A., Zapata E.L., and Carazo J.M.; “Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences”; Bioinformatics vol.14 no.5 1998 (pp.439-451); (formerly CABIOS)
[4] Trelles O.; “On the parallelisation of bioinformatics applications”; Briefings in bioinformatics (May, 2001) vol.2 (2) pp. 181-194
[5] Trelles Oswaldo and Rodríguez Andrés; “Parallel Metaheuristics in Bioinformatics: A new class of algorithms”; Wiley Series on Parallel and Distributed Computing, Edited by: Enrique Alba, (ISBN-13-978-0-471-67806-9) pp: 517-549
[6] Perez A.J., Rodríguez, A., Trelles O., Thode G.; “A computational strategy for protein function assignment which addresses the multidomain problem”; Comparative and Functional Genomics , 2002, (3) pag.423-440
[7] Rodriguez, A.; Carazo, J.M. and A., Trelles O.; “Mining Association Rules from Biological Databases”; Journal of the American Society for Information Science and Technology 56(5):493–504, 2005
[8] Rodriguez, A.; Shotton,D.; Guil, N. and Trelles, O.; “Automatic analysis of the content of cell biological video and database organization of their metadata descriptors”; IEEE Transactions in Multimedia (February 2004) Vol 6, No. 1, pages 119-128
[9] Rodríguez, A.; Shotton,D.; Guil, N. and Trelles, O.; “Analysis and Description of the Semantic Content of Cell Biological Videos”; Multimedia Tools and Applications, 25, 37-58, 2005
[10] Manuel J. Martín-Vázquez; Mario A. Trelles; Alejandro Sola; R. Glen Calderhead and Oswaldo Trelles; (2006); "A new user-friendly software platform for systematic classification of skin lesions to aid in their diagnosis and prognosis"; Journal of Laser in Medical Science (2006) 21: 54–60
[11] M.A.Trelles, M.D, PhD; X.Alvarez, M.J.Martín-Vázquez, O.Trelles, M.Velez, J.L. Levy and I.Allones M.D.; “Assessment of the Efficacy of Nonablative Long-Pulsed 1064-nn Nd:YAG Laser Treatment of Wrinkles Compared at 2, 4 and 6 Months”; Facial Plastic Surgery; vol. 21, num. 2, 2005 (special issue: Lasers: New Technology and Emerging Trends)
[12] L.F. Romero, A. Rodríguez, A. Muñoz C., M.A.Trelles, E.L.Zapata and O.Trelles; (1996); "Efficient Computational Parallel Solutions for Laser/Tissue Interaction Modelling"; BIOS-Europe'96; Viena, Austria
[13] Jorge García de la Nava, Sacha van Hijum and Oswaldo Trelles; “PreP: gene expression data pre-processing”; Bioinformatics 2003 Nov 22; 19 (17): 2328-2329
[14] Jorge García de la Nava, Sacha A.F.T. van Hijum and Oswaldo Trelles; “Saturation and quantization reduction in microaray experiments using two scans at different sensitivities”; Statistical application in genetics and molecular biology; vol 3. Issue 1, article 11. The Berkeley Electronic Press. ISSN: 1544-6115. (2004)
[15] Jorge García de la Nava, Daniel Franco Santaella, Jesús Cuenca Alba, José María Carazo, Oswaldo Trelles, Alberto Pascual-Montano; “Engene: The processing and exploratory analysis of gene expression data”; Bioinformatics vol.19 no.5 (2003) pp.657-658
[16] P. Carmona-Sáez, M. Chagoyen, A. Rodríguez, O. Trelles, J. M. Carazo and A. Pascual-Montano.; “Integrated analysis of gene expression by association rules discovery”; BMC Bioinforamtics 2006, 7:54
[17] J.F. Aldana, M. Roldán-Castro, I. Navas, M.M. Roldán-García, M. Hidalgo-Conde, and O.Trelles; “Bio-Broker: Integration of Biological Data Sources and Data Analysis Tools”; Software, Practice and Experience 2006; 36:1585-1604. Published Online: www.interscience.wiley.com
[18] Ismael Navas-Delgado, Maria del Mar Rojano-Muñoz, Sergio Ramírez, Antonio J. Pérez, Eduardo Andrés León; Jose F. Aldana-Montes, and Oswaldo Trelles; “Intelligent client for integrating bioinformatics services”; Bioinformatics, vol.22 no.1 2006 pages 106-111