jueves, 11 de junio de 2015

Mobile Computing and Corporate Data Protection: Bring Your Own Device Security Issues and Challenges

The following paper describes the strategic importance of information technology in the context of corporate data protection, mobile computing security, heterogeneous data storage and isolation. For the purpose of this document we will discuss the works of Yong, Jinpeng & Vangury in their paper “Bring Your Own Device Security Issues and Challenges”.

Enterprise collaboration and the exchange of corporate information had increased thanks to the proliferation of mobile computing technologies and the massive adoption of mobile devices. The concept of Bring Your Own Device (BYOD) appears to be magnificent because it allows corporate users to share, query and use data anytime, anywhere and from the same devices used for their personal activities. However, it also implies a challenge in terms of network access and data protection. How to share and store information in a BYOD environment safely and at the same time preventing the leakage of sensitive data?

What if the device is lost or stolen? Who can access the data? Why? When? Where? What if an employee leaves the company with his BYOD device and the corporate data? All these issues are the biggest challenges for the security of corporate data in an increasingly mobile and flexible environment. Therefore, “a set of principles that any organization should follow before implementing the BYOD framework must include availability, usability, mobility and security.”(AlHarthy & Shawkat, 2013)

In terms of vulnerabilities, BYOD is very sensitive to confidentiality issues, data isolation and security policies compliance. “A few solutions exist for BYOD security. However, limitations and drawbacks have been found in these solutions.” (Wand, Wei & Vangury, 2014)

The research shows that an ideal BYOD solution must be able to separate corporate space from personal space and protect corporate data and monitor and reject unauthorized and illegal data access. Therefore, a BYOD security framework is proposed based on three operational layers: space isolation, network access control and security policies database.

In general terms, Wand, Wei & Vangury approached their experiments by comparing two techniques: Agent-based BYOD discovery system and Scanning-based BYOD discovery system. An agent-based BYOD discovery system requires a mobile app installed in the BYOD device. This piece of software is responsible for reporting the device status to a centralized network management and monitoring system. The centralized monitoring system enforces password rules and a full set of security policies. On the other hand, under scanning-based BYOD discovery system no application is required and a network scanning tools is responsible for detecting BYOD devices. However, this method is only possible within small network areas, takes extra time to perform the discovery and add considerable traffic to the corporate network.

 In conclusion, a BYOD security framework was evaluated by the authors with the intention to provide guidance for enterprises willing to adopt BYOD. It also has been stated by (Wand, Wei & Vangury, 2014) that an ideal BYOD solution must be able to separate corporate space from personal space. At the same time, it must be able to protect corporate data and monitor and reject unauthorized and illegal data access. Limitations and drawbacks were found on the technologies evaluated. Therefore, further work is required in order to design a framework and developing tools to protect BYOD networks, which at the same time represents a potential area for research.

References

AlHarthy, K., & Shawkat, W. (2013, Nov. 29 2013-Dec. 1 2013). Implement network security control solutions in BYOD environment. Paper presented at the Control System, Computing and
Engineering (ICCSCE), 2013 IEEE International Conference on.

Yong, W., Jinpeng, W., & Vangury, K. (2014, 10-13 Jan. 2014). Bring your own device security issues and challenges. Paper presented at the Consumer Communications and Networking Conference (CCNC), 2014 IEEE 11th.

Social collaboration via social tools – Social media to solve business problems

The following article will describe potential areas for research within the field of social networks and social collaboration in the enterprise scenario. The works from Smith, Hansen & Gleave (2009) in their paper “Analyzing Enterprise Social Media Networks” will be used as a research case that shows how social networks and Internet systems can be used as a strategic source for information in the company.

The phenomenon of social networking has moved into the workplace. According to a report from Next Vision IT Security is currently estimated that more than 300,000 companies worldwide use social networks for business purposes or as a tool for internal communication.  Enterprise social networks are poised to revolutionize how people interact in the workplace. Therefore, there is a pressing need to understand how people are using these social networks. (Jin, Hongyu & Friedman, 2013)

According to Smith, Hansen & Gleave (2009) “Social media tools provide a wealth of data that can be transformed into insights about the structure and dynamics of an enterprise or organization… Managers and analysts can use these metrics to better understand organizational dynamics, allowing them to better measure the effects of interventions and events.”

Smith, Hansen & Gleave (2009) supported the relevance of their research in that social network structures are created when people connect to one another through a range of ties. Therefore extracting, processing, and analyzing these networks can reveal important patterns in the structure and dynamics of the institutions.

In general terms, Smith, Hansen & Gleave (2009) approached their experiments by implementing various pieces of software known as social sensors, clickstream captures, feed subscription analysis and data mining. These computational mechanisms were able to collect, from the Internet, various aspects of people activities in the enterprise. Once the information was captured, a mathematical analysis was applied in the form of graph, behavior and content semantics. In their conclusion, the authors were able to determine the effect of social networking in the enterprise setting over revenues and project performance.

A similar study was performed by Ta-Shun, Hsin-Yu, Ling-Ching (2010). They observed a high-tech firm in Taiwan and demonstrated that the social network attributes of the firm itself can be examined to determine the relationships with the firm's profit and research and development capability.

In conclusion Smith, Hansen & Gleave (2009) built a formal model to relate the social interactions of the members of a company with the company’s metrics on performance. Regarding future challenges in the area, opportunities were found to develop the capacity of integrating teleconference (video and voice) and voice data into the network analysis. This area could provide an opportunity for the development of new algorithms for real time data extraction from streaming technologies. In addition, by Ta-Shun, Hsin-Yu, Ling-Ching (2010) recommendation, a future research agenda in the subject should include quantitative analysis of other high-tech industries.  Finally, as stated by Smith, Hansen & Gleave (2009), network visualization, especially visualizing it in large-scale and evolutionary manner, is also a challenge.

References

Jin, C., Hongyu, G., Li, L. E., & Friedman, B. (2013, 14-19 April 2013). Enterprise social network analysis and modeling: A tale of two graphs. Paper presented at the INFOCOM, 2013 Proceedings IEEE.
Next Vision IT Security (n.d). Use of Social Networks in the Enteprise Setting. Retrieved at http://www.nextvision.com/img/pdf/informe-redessociales.pdf

Smith, M., Hansen, D. L., & Gleave, E. (2009, 29-31 Aug. 2009). Analyzing Enterprise Social Media Networks. Paper presented at the Computational Science and Engineering, 2009. CSE '09. International Conference on.

Ta-Shun, C., Hsin-Yu, S., & Ling-Ching, Y. (2010, 18-22 July 2010). Social network analysis of directors and supervisors in Taiwan semiconductor industry. Paper presented at the Technology Management for Global Economic Growth (PICMET), 2010 Proceedings of PICMET '10:.

miércoles, 12 de marzo de 2014

AzureBench: Benchmarking the Storage Services of the Azure Cloud Platform and an Application Framework for Scientific Applications on Windows Azure Cloud Platform

Experiment reproduction and evaluation. From the works of:  Dinesh Agarwal & Sushil Prasad. Actual scientific computing scenario faces the reality and the need for dealing with enormous set of data-intensive computational tasks. These activities require massive storage resources as well as immense computational power. While the largest and best funded research projects are able to afford expensive computing infrastructure, most other projects are forced to opt for cheaper resources such as commodity clusters or simply limit the scope of their research (Lu, Jackson & Barga, 2011).

However, a not so well-known characteristic of cloud computing environments, and a major concern for the consumer segment is performance. This topic is considered as a permanent subject of discussion in various scientific forums. Actual providers are unable to provide solid offers about performance guarantee. The common practice is availability, portability and compatibility.

Since 2010, Microsoft offers a Platform as a Service Model, with the associate financial model of pay-as-you-go. Under the name of Microsoft Azure platform, this cloud computing platform offers a set of cloud computing services and might provide an option for the computational and storage needs of such scientific computation activities.

Windows Azure allows the users to lease Windows virtual machines instances according to a platform as service model and offer the .Net runtime as the platform through two programmable roles called Worker Roles and Web Roles. Azure also supports VM roles, enabling the users to deploy virtual machine instances supporting an infrastructure as a service model as well.

Azure Cloud computing seems to promise a viable solution for the computational demands from the scientific community. However, many questions and concerns arises in regards the performance guarantees and capabilities of such system. Consulted literature suggests that there is a generalized sense in the community for the need of deeper understanding of the performance variables of cloud systems. Recent work on the area of cloud computing has focused on this concern.

Azure network and storage service is optimized for scalability and cost efficiency. However, the primary performance factors in an HPC system are communications latency and bisectional bandwidth. Windows Azure cloud platform lacks the traditional HPC software environment including MPI and Map-reduce. On the other hand, while it might not be the best fit for some scientific applications that essentially require MPI-like functionality, it can provide a very simple model for a wide variety of scientific applications (Awargal & Prasad, 2012).

There are striking differences between scientific application workloads and the workloads for which Azure and other cloud platforms were originally designed, specifically long lived web services with modest intra-cluster communications. Scientific application workloads, on the other hand, define a wide spectrum of system requirements. There is a very large and important class of \parameter sweep" or \ensemble computations" that only require a large number of fast processors and have little inter-process communication requirements. At the other extreme there are parallel computations, such as fluid flow simulations, where the messaging requirements are so extensive that execution time is dominated by communication (Lu, Jackson & Barga, 2011).

The importance of Azure platform has been recognized by industry as well as academia as is evident from the rare partnership of National Science Foundation (NSF) and Microsoft in funding scientific research on Azure cloud (Agarwal & Prassad, 2012).

In 2012, an open-source system called AzureBench was presented to the community as and aid to the HPC developers and as a contribution to the scientific community in need of HPC resources and who are facing the task of choosing the most suitable provider for their needs. This system provides a benchmark suite for performance analysis of Azure cloud platform’s and its various storage services. The authors provide a generic open-source application framework that can be a starting point for application development over Azure.

AzureBench provide a comprehensive scalability assessment of Azure platform’s storage services using up to 100 processors. It provides updated, realistic performance measurements as they utilize the latest APIs released after significant changes were made to the Azure cloud platform since 2010. AzureBench work represents an extension of the preliminary works from Hill et al. in the publication “Early observations on the performance of Windows Azure”, from 2010.


Prior research

As mentioned before, a significant concern about cloud systems is performance. Nowadays, few or none of the major providers are able to offer a solid guarantee on application performance. Due to this concern, Hill et al initiated their research in 2010 under the subject of “Early Observations on the Performance of Windows Azure”. The use of the “early” terms is obviously derived from the first days of Azure system which was launched in 2010.

In their research they initially tried to overcome the issue that developers faces as more cloud providers and technologies enter the market: Deciding which vendor to choose for deploying their applications. Therefore their work was developed under the premise that one critical step in the process of evaluating various cloud offerings can be to determine the performance of the services offered and how those performance conditions match the requirements of the consumer application.

Regardless of the potential advantages of the cloud in comparison to enterprise-deployed applications, cloud infrastructures may ultimately fail if deployed applications cannot predictably meet behavioral requirements (Hill et al, 2011).

Hill et al presented the results from experiments showing an exhaustive performance evaluation of each of the integral parts of the Azure platform: virtual machines, table, blob, queues and SQL services. Based on these experiments, their work provided a list of performance related recommendations for users of the Windows Azure platform. 

One interesting element that was found during the analysis was related to the Blob storage performance. It was observed that depending on the number of concurrent clients and the size of the data objects to be stored, the performance of Blob was between 35% to 3 times faster than Table. However, due to the black box nature of the cloud services the researchers were unable to provide an explanation for this behavior.  

Regarding the queues behavior it was found that multiple queues should be used for supporting many concurrent readers/writers as performance degraded as concurrent readers and/or writers were added. They also found that message retrieval was more affected by concurrency than message put operations so users cannot assume similar scale at each end of the queue.

In general, it was concluded that performance over time was consistent, although there are rare occasions -less than 1.6% occurrence for 50% slowdown or worse, 0.3% for 2x slowdown or worse- where performance degraded significantly. Therefore, we have seen no reason to provision assuming a worst-case behavior in Azure that is significantly worse than average-case behavior (Hill et al, 2011).
Methodology

The Windows Azure Platform is composed of three services: Windows Azure, SQL Azure, and AppFabric. Agarwal & Prasad research focuses on Windows Azure, which encompasses both compute resources and scalable storage services.

In order to evaluate Windows Azure storage mechanisms, Agarwal & Prassad, deployed 100 Small VM. A typical Small VM for Windows Azure has 1 CPU core, 1.75GB RAM and 225 GB HD. Those virtual machines read/write from/to Azure storage concurrently.

The AzureBench Application workflow starts with a web-interface where users have an option to specify the parameters for background processing. VM configuration for web role depends on the intensity of the tasks to be handled by the web role. For applications where web role performs computationally-intensive operations, a fat VM configuration should be chosen. Similarly, if the web role needs to access large data items from cloud storage, it could be a fat VM to upload/download data to/from the storage using multiple threads. To communicate task with worker roles, web role puts a message on a Task assignment queue.


They also selelected several metrics, structured according to the execution phases of scientific applications. A first step consists of deploying the customized environment and fetching the initial data. In a second phase, the application was executed, so they were also interested in the computation performance and the efficiency of the network transfers. The following table summarized the procedures executed during the evaluation process.




The exact experiment is repeated 5 times with varying entity sizes of 4 KB, 8 KB, 16 KB, 32 KB, and 64 KB.


Blob
Each VM upload one 100MB blob to the cloud in 100 chunks of 1 MB each. Workers are synchronized to wait until all VM are done uploading blobs before starting download. There is no API in the Azure to manage process synchronization. This situation if worked around with the implementation of a queue as a shared memory resource.



Queue
Three operations were tested: inserting a message using PutMessage API, reading a message using GetMessage API, and reading a message using PeekMessage API. Evaluation was done under two scenarios: each worker works with its own dedicated queue, and all workers access the same queue. For both experiments, a total number of 20K messages were first inserted in the queue, and then read using both APIs, and finally deleted from the queue.



Table
Each worker role instance inserts 500 entities in the table, all of which are stored in a separate partition in the same table. Once the insertion completes, the worker role queries the same entities 500 times. After the querying phase ends, the worker role updates all of the 500 entities with newer data. Finally, all of these entities are deleted.


The methodology implemented by Agarwal & Prasad allows for a deep understanding an analysis of Azure platform’s middleware. They perform exhaustive activities over the Queue storage mechanism, query-like interface to store data through Table storage, and persistent random access to hierarchically stored data through Blob storage. Also the utility based framework facilitates experimenting with large amount of compute power obviating the need to own a parallel or distributed system.

The experimentation focuses on Azure cloud storage services which are its primary artifacts for inter-processor coordination and communication. The iterative algorithm provides a full cycle of test from different perspective of data size and parallel VM workers. This approach allowed the researcher to point out various bottlenecks in parallel access of storage services.

Contribution

The major contribution for this work, aside from the results and guides provide for HPC developers, is that AzureBench is an open-source benchmark suite hosted at Codeplex repository available under GPLv2. Therefore, the open source of this project nature should motivate further research in this direction

Another important factor is that Agarwal & Prasad assessments on Microsoft Azure loud platform provide updated and realistic performance measurements. This was verified not only for the present results but also because the source code demonstrate that all utilized APIs were released after significant changes were made to the Azure cloud platform since 2010.

They also provided to the community with a template for a generic application framework for Azure cloud. This will also help interested developers to reduce time on the learning curve by means of a clear and solid starting point to design their own applications.

Finally, Agarwal & Prasad provide a summary of their findings and make recommendations for developers to efficiently leverage the maximum throughput of storage service

Personal Experiment Discussion

In the personal experiment we intended to reproduce the execution of the algorithms used for blob, tables and queue performance analysis. In order to do so, we download and installed the open source version (Alpha) of AzureBench from Codeplex repository. The source code available is based on C#/ASP and has it last review on January 2012. 

Resources used during the experiment reproduction are the following:

Microsoft Azure Account
Trial
Cluster Affinity Group
East USA
VM Size
Small ( 1 Core + 1.75GB RAM)
Maximum VM Cores
20
Development Platform
Visual Studio 2010 + Azure SDK 1.7
VM WebRoles
2
VM WorkRoles
2-4-8-16

Experiment was divided into 10 steps:
1. Source Code Deployment
2. Azure Account Setup
3. Source Code analysis and framework interpretation
4. Source tweaking and modification for Local Emulation
5. Source tweaking and modification for Source Deployment
6. Source Code compilation and binary built
7. Azure Application Package Generation
8. Cloud Application Deployment
9. VM Workers Provisioning 
10. Application Execution and Benchmark Execution
a. Behavior Observation
b. Analysis of Results

Our testing environment was the same as the one used in the original paper, except for the total number of VM workers available for testing. The original work performed test from 1 to 96 VM workers (Cores). Our scenario was limited to 20 VM workers due to budget restrictions. Our execution included the iterations of data sizes for messages from 4K to 64K.

It was observed that the behavior obtained during the tests for tables and queue, using 2-4-8 & 16 VM Workers had the same patter (not the same values) as the ones presented on the original paper. The projection shows that this patter would be the same for testing with more than 16 cores.

We were able to reproduce 2 out of  3 experiments presented in the original paper. Source available for Blob performance analysis has bugs and falls into infinite loops when executed over the cloud. By analyzing the source we estimate that this condition is due to an error on the index boundaries used to keep track of the pages during the Synchronization phase. We intend to correct this issue in later versions of this work.

Figure 2 show our results when running Azure Blob Experimental Bench. The application crashed, however, the VM Workers kept trying to recover (heal) themselves from the failure. 50%-90% VM workers gone offline. Application ran form 14 hours without showing any progress on recovery action.


Figure 3 shows that updating a table is the most time consuming process. AS in the original paper it was evidenced that for entity sizes 32 KB and 64 KB, the time taken for all of the four operations increases drastically with increasing number of worker role instances




Windows Azure platform maintains three replicas of each storage object with strong consistency. Figure 4 shows the time to put a message on the queue. For Put Message operation, the queue needs to be synchronized among replicated copies across different servers. We were able to reproduce the behavior of the Peek message operation, noticing that it is the fastest of all three operations. According to Agarwal & Prasad this is due to the fact that there is no synchronization needed on the server end. On the other hand, the Get Message operation in the most time consuming, in addition to synchronization, the message also becomes invisible from the queue for all other worker role instances, therefore this new state needs to be maintained across all copies.
Figure 5 shows a result that is also consistent with the original paper result. During the bench for Azure Shared Queue, we were able to achieve the same behavior as Agarwal and Prasad experiment of Queue storage when multiple workers are accessing a queue in parallel. We can observe that access to the queues increases the contention at the queue and in consequence the time taken by each operation is greater than the time taken when each worker accesses its own queue.

The learning curve for the Azure platform, the operational architecture and the development framework from Visual Studio are very steep, unless the developer has a solid knowledge of Windows WCF. Once the architecture is understood the deployment process is very simple and almost seamless. However, one of the drawbacks learned during the experimentation phase was the application deployment time and roles instantiation. These activities are time consuming and their duration is unpredictable. It can take minutes or even hours depending on the size of the project (amounts of VM workers) and the current state of the cloud. Microsoft charges deployment time as computing time, therefore there is a cost implied.

The results show that Azure offers good performance for the benchmarks tested. However, as stated by Microsoft Research, we can observe that Windows Azure is not designed to replace the traditional HPC supercomputer. In its current data center configuration it does not have the high-bandwidth, low-latency communication model that is appropriate for tightly-coupled jobs. However, Windows Azure can be used to host large parallel computations that do not require MPI messaging.
Further Research

As a future work, additional services provided by Windows Azure platform, such as local drives, caches, and SQL Azure database should be studied in terms of performance. Another aspect that is out of the scope of the actual research, but could be an expansion from Agarwal & Prasad works, is the study of resource provisioning times and application deployment timings and Azure Appfabric.


Instance acquisition and release times are critical metrics when evaluating the performance of dynamic scalability for applications. Other subjects of study derived from this work could be evaluation of Azure Drives and the NTFS storage abstraction.

Another aspect that is suitable for deeper studies is behavior evaluation of direct instance-to-instance TCP performance as this mechanism provides an alternative to the other storage services for communication between instances that has lower latency.

Finally, benchmarking suited for other cloud offerings by different vendors could be incorporated. However, comparison with other cloud platforms is also not studied in this paper primarily due to the differences in architectures.

References
Agarwal, D., & Prasad, S. K. (2012). AzureBench: Benchmarking the Storage Services of the Azure Cloud Platform. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (Vol. 0, pp. 1048–1057). Los Alamitos, CA, USA: IEEE Computer Society. 

Gunarathne, T., Zhang, B., Wu, T.-L., & Qiu, J. (2011). Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure. Utility and Cloud Computing, IEEE Internatonal Conference on (Vol. 0, pp. 97–104). Los Alamitos, CA, USA: IEEE Computer Society. 

Hill, Z., Li, J., Mao, M., Ruiz-Alvarez, A., & Humphrey, M. (2010). Early observations on the performance of Windows Azure. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC  ’10 (pp. 367–376). New York, NY, USA: ACM. 

Iosup, A., Ostermann, S., Yigitbasi, M. N., Prodan, R., Fahringer, T., & Epema, D. H. J. (2011). Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing. IEEE Transactions on Parallel and Distributed Systems, 22(6), 931–945. 

Lu, W., Jackson, J., & Barga, R. (2010). AzureBlast: a case study of developing science applications on the cloud. Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC  ’10 (pp. 413–420). New York, NY, USA: ACM. 

Roloff, E., Birck, F., Diener, M., Carissimi, A., & Navaux, P. O. A. (2012). Evaluating High Performance Computing on the Windows Azure Platform. 2012 IEEE Fifth International Conference on Cloud Computing (pp. 803–810). Los Alamitos, CA, USA: IEEE Computer Society. 





domingo, 23 de febrero de 2014

Discos Duros USB 3.0 ultra slim de última generación

La corporación Western Digital asegura otro segmento del mercado al lanzar su nueva línea de discos duros portatiles de la línea "Passport" o Pasaporte , al incorporar en sus características esenciales la tecnología USB 3.0 que permite un acceso de transferencia de datos de hasta 5GBPS cuando están conectados a un sistema que cuenta con dichos puertos de conexión.

Los nuevos modelos ; My Passport Ultra y My Passport Slim cuentan con una poderosa suite de software para encriptación y manejo de backup dedicado , así como una herramienta de monitoreo del status operacional del disco duro a nivel de hardware, tienen un tamaño bastante reducido y vienen en un chassis metálico que permite la disipación del poco calor que pueda generar la unidad interna al funcionar por largos períodos de tiempo.

Como la mayoría de estos dispositivos, no es recomendable dejarlos caer o someterlos a condiciones extremas de presión o temperatura, vienen con una garantía limitada a función de 3 años e incluyen un pouch o estuche especial para protegerlos a nivel estético.

Aquí pueden revisar más información acerca de estos nuevos discos ultra portatiles :

 Tabla de comparacion de modelos My Passport

Hasta la proxima amigos ! .....

Desarrollo de Video Juegos: 93 Mil Millones durante el 2013

El año 2013 representó un momento importante para la industria del video juego, pues grandes firmas como Microsoft, Sony y Nintendo lanzaron sus consolas de última generación, marcando la extensión del servicio de entretenimientos digitales más alla de experiencia clásica del juego de consola tradicional.

Según un reciente estudio de Gartner Group, el mercado mundial de los videojuegos tocó techo en 2013 con un crecimiento del 18%, para alcanzar un total de 93 mil millones de dólares en ventas durante el 2013. Se preveé que para el 2015 esta cifra alcance los 111 mil millones. La facturación y crecimiento de este sector de la industrial del entretenimiento, ha superado a los de la industria del cine, la música y el video. Lo que resulta aun más impresionante es que el segmento de los video juegos móviles presentan el sector con mas rápido desarrollo. Para el caso de los EEUU, ha pasado de generar un total de 11 mil millones a 21 mil millones en poco menos de 2 años.

Otro aspecto interesante de este fenómeno tecnológico de la industria del video juego, es que no solo tiene influencia en el aspecto económico, sino que además impulsa de manera integral el desarrollo de las tecnologías que actualmente convergen para generar la experiencia demandada por el usuario. Con esto me refiero a cuatro elementos fundamentales: capacidad de procesamiento computacional, contenido, dispositivos y ancho de banda. El software de entretenimiento es el responsable de generar una porción importante de todas las ventas y la innovación en cada de estas industrias relacionadas. 

Otro segmento que no escapa a los efectos del desarrollo y crecimiento de esta industria, es el mercado laboral. Hace unas semanas salio un estudio del mercado laboral de la industria de los videojuegos (Gamasutra Salary Survey). El objetivo de este estudio fue evaluar el rango de ingresos actuales de los diferentes trabajadores que integran y conforman la industria de los videojuegos. El estudio analizo el impacto de la crisis economica en el sector desde el punto de vistas de los ingresos de los asalariados del mercado laboral de los videojuegos. La conclusion es que el ingreso medio a nivel sueldos crecio una media del %7 durante el 2013 ascendiendo a los USD 79,000 anuales (durante el 2012 el ingreso medio llegaba tan solo a los USD 74,000).

¿Quienes tendrán los salarios mas altos? ¿Los programadores? ¿Los diseñadores? ¿Tal vez el equipo de ventas? Bueno, si, claramante la parte comercial son los que se llevan la mayor tajada. No obstante, desde el punto de vista de desarrollo de software, que es la parte que nos interesa, encontramos los siguientes resultados:
  • Programación: Representan algunos de los talentos mejor pagados en la industria del video juego, el salario promedio de los programadores aumento hasta 92,962 dólares comparado con 85,733 dólares respecto a años anteriores . 
  • Arte y Animación: Para los artistas y animadores los salarios medios aumentaron a 75,780 comparados con 71,354 con respecto del año anterior. 
  • Diseñador: Los diseñadores de juegos , escritores y directores creativos promediaron un paquete de 73,386 frente a 70,223 respecto al año anterior.
  • Tester y Quality Assurance: Los profesionales de control de calidad (QA testers) son los trabajadores con salarios más bajos en la industria de los juegos, su salario promedio disminuyó a 47,910 dólares comparados con 49,009 dólares del año anterior.
  • Negocios: para empresas y trabajadores legales siguen siendo los mejor pagados en la industria, pero sus salarios promediaron los 106,452 en 2010.
Caso República Dominicana:

En la caso especifico de la Republica Dominicana, apenas se empienzan a dar los primeros pasos para el incentivo en la generacion de empleos y la creacion de industrias en esta indole. VAP Dominicana es la primera empresa de talento local que se propone hacer “outsourcing” de videojuegos desde el país. Será una especie de zona franca que fabricará partes de videojuegos para exportarlas por encargo a estudios de Estados Unidos y Europa, donde terminaran de “ensamblarse”.

Con un capital semilla de US$100 mil para el primer año, el proyecto es uno de los 12 que conduce la incubadora de negocios de tecnologia Emprende, ubicada dentro del Parque Cibernetico. VAP se creó en respuesta a la propuesta del que será su primer cliente, Trilogy Studio, cuyo presidente, Michael Pole, visitó el país en 2006. Despues de dos fructiferas reuniones con casi un centenar de jovenes que presentaron videojuegos de calidad hechos con herramientas “rusticas”, Pole ofrecio contratar a la empresa que se animara a fabricar los videojuegos para el estudio. Los fundadores de Trilogy son ex empleados de Electronic Arts que hace tres años fundaron su propia empresa y dejaron el espacio donde crearon videojuegos tan famosos como Halo3 y Medal of Honor. La apuesta del estudio es desarrollar los mundos virtuales, que son videojuegos de bajo costo que cobran una suscripcion tipo World of Warcraft.

La empresa emergente se propone producir US$20 millones dentro de cinco años y capacitar, en ese periodo, entre 150 y 200 personas que podrán obtener un salario que evolucionará de un promedio de US$15mil a US$40 mil por año.

El “outsourcing” es para la industria del videojuego como el cemento y la varilla para un edificio. El mercado se estima en 30% de la industria y genera empleos en areas especializadas como programación, guión, diseño, creación de personajes y música. 

Los paises lideres en “outsourcing” de videojuegos son China, Irlanda, Europa del Este e India. “Para las empresas en Estados Unidos es complicado tener 13 horas de diferencia de horario con China y ciertas diferencias culturales y de idioma, que no permiten la comunicacion constante que este tipo de trabajos exige”. Desde el 2006 la República Dominicana, aspira a capturar una parte de ese mercado por la posicion estratégica del país y fomentar que se desarrollen otras empresas similares que motoricen esa actividad en el pais, junto con la animación para publicidad y cine. 

Pero aun falta superar la principal limitante que originalmente tiene la empresa: conseguir personal capacitado para alcanzar los niveles de produccion necesarios. Instituciones educativas nacionales deben desarrollar estrategias para crear un mecanismo efectivo para reclutamiento de talentos y formacion en las areas requeridas por la industria, especialmente en el componente de desarrollo de software complejo. Pero sobre esto abundaremos proximamente en otro post. Segun las demandas actuales los jóvenes interesados en incursionar en la industria del desarrollo de juegos, deben dominar una variedad de tecnologías entre las que se encuentran: C++, Java, OpenGL, DirectC, Blender, Maya, Photoshop, entre otros.

jueves, 13 de febrero de 2014

Algoritmos de Filtrado Colaborativo para Sistemas de Recomendación Automática

En los Sistemas de Recomendación existen dos paradigmas para la selección de elementos, basados en contenido y filtrado colaborativo. En los sistemas basados en contenido el usuario recibirá información similar a la que ha mostrado interés en el pasado, mientras en el filtrado colaborativo las sugerencias serán de elementos que han gustado a gente con intereses similares a los suyos.

En la literatura existente se describen los Sistemas de Recomendación basados en Filtrado Colaborativo (FC) como sistemas que trabajan recogiendo juicios humanos, expresados como votaciones, sobre una serie de ítems en un dominio dado, y tratan de emparejar personas que comparten las mismas necesidades o gustos [Herlocker et al. 1999; Pazzani 1999; Adomavicius and Tuzhilin 2005; Breese et al. 1998].  Los usuarios de un sistema colaborativo comparten sus valoraciones y opiniones con respecto a los ítems que conocen de forma que otros usuarios puedan decidir qué elección realizar. A cambio de compartir esta información, el sistema proporciona recomendaciones personalizadas para aquellos elementos que pueden resultar interesantes al usuario.

El proceso básico es hacer un esquema de concordancia entre la información que se tiene del perfil del usuario actual y los perfiles de otros usuarios que se tienen almacenados y de cuyas referencias se tiene conocimiento, a esto se le conoce como “filtrado colaborativo de vecindad más cercana”.



 Los algoritmos FC pueden ser agrupados en dos clases generales [Adomavicius and Tuzhilin 2005; Breese et al. 1998]: los basados en memoria, que se basan en una vecindad completa de usuarios y sus valoraciones para el cálculo de predicciones [Herlocker et al. 1999; Adomavicius and Tuzhilin 2005], y los basados en modelos, que usan esas valoraciones para aprender un modelo que será el usado para predecir [Ungar and Foster 1998; Kim and Yum 2005; Breese et al. 1998]. La información manejada en FC consta de una serie de ítems, usuarios y valoraciones proporcionada por los usuarios sobre esos ítems: el espacio del problema viene definido como una matriz de usuarios frente a ítems, en la que cada celda representa la puntuación de un usuario concreto referida a un ítem específico. 



Resolver un problema típico de FC implica predecir qué valores tendría un usuario para aquellos ítems que aún no ha puntuado, basándonos para ello en las valoraciones aportadas anteriormente por la comunidad de usuarios [Adomavicius and Tuzhilin 2005; Herlocker et al. 1999].

Sistemas de filtrado
Existen diversas formas de realizar un filtrado de información, dependiendo del algoritmo de aprendizaje empleado. Según [Vélez y Santos, 2006] existen dos formas de realizar un filtrado de información:

  •  Filtrado Colaborativo: se basa en las calificaciones que realizan los usuarios sobre un dominio.
  •  Filtrado de Contenido: se basa en el enfoque tradicional de recuperación de información por palabras claves.


El Filtrado Colaborativo se puede realizar aplicando diversas formas algorítmicas:

Algoritmo de Horting: Técnica basada en grafos en la cual los nodos son los objetos y las aristas entre nodos son indicadores de los grados de similitud entre dos objetos. Las predicciones se producen al recorrer el grafo entre nodos cercanos y combinando las informaciones entre objetos cercanos. 

Redes Bayesianas de Creencia: Las Redes Bayesianas de Creencias (RBC) también se  conocen como Redes de Creencias, Redes Probabilísticas Causales, Redes probabilísticas Gráficas. Una RBC es una red gráfica que representa relaciones probabilísticas entre variables. Las RBCs permiten razonar bajo incertidumbre y combinar las ventajas de una representación visual intuitiva con una base matemática en la probabilidad bayesiana P(A/B) = P(A,B)/P(B)

Similitud Basada en el Coseno: Esta similitud da una buena medida del “parecido” de dos  vectores en un espacio multidimensional, el espacio puede describir características de usuarios o  de ítems, tales como palabras claves. La similitud entre ítems es medida computando el coseno entre el Angulo entre estos dos, mediante la ecuación:

Redes Neuronales: Las Redes Neuronales (RN) proporcionan una forma muy conveniente de representación del conocimiento, donde los nodos representan objetos del proceso de recuperación de información como palabras claves y los enlaces representan la asociación ponderada de estos (relevancia). Las RN aplicadas al filtrado colaborativo son de reciente uso, en [Nasraoi, 2004], se desarrolla una aplicación de predicción de URLs que se dan como recomendación a los usuarios, según su perfil.

Correlación de Pearson: Es una métrica típica de similitud entre funciones de preferencias de usuarios o distancias de vectores. Los vectores comparados coinciden en una escala desde cero (no similares) a uno (coincidencia total), y -1 (diferencia total)


Desde el punto de vista científico-técnico, esta propuesta pretende abordar parte de los retos especificados como mejoras posibles a los mecanismos de filtrado. Inicialmente, nuestro modelo de sistemas estará basado en la Teoría de Vínculos Débiles de Granovetter, el cual afirma que el grado de coincidencia entre dos sistemas individuales varía directamente según la fuerza que los une o vincula entre sí. Nuestra decisión se fundamenta en el hecho de que la mayoría de los modelos sistémicos de filtros colaborativos emplean modelos de unión de lazos fuertes. Otro fallo fundamental de los actuales modelos existentes es que no relacionan de forma convincente las interacciones a un nivel micro con los modelos de nivel macro. Estudios estadísticos, al igual que cualitativos, ofrecen una buena muestra de investigación acerca de este fenómeno.

 Referencias:

[Adomavicius and Tuzhilin, 2005] Adomavicius, G., and A. Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Ieee Transactions on Knowledge and Data Engineering 17 (6):734-749.

[Herrera-Viedma et. al., 2003] E. Herrera-Viedma, L. Olvera, E. Peis, C. Porcel. 2003. Revisión de los sistemas de recomendaciones para la recuperación de información. Tendencias de investigación en organización del conocimiento. Trends in knowledge organization research, José Antonio Frías, Ed. Críspulo Travieso, Universidad de Salamanca, 507-513.

[Herlocker et al. 1999] Herlocker, J. L., J. A. Konstan, A. Borchers, and J. Riedl. 1999. An algorithmic framework for performing collaborative filtering. In Sigir'99: Proceedings of 22nd International Conference on Research and Development in Information Retrieval, edited by M. Hearst, F. Gey and R. Tong. New York: Assoc Computing Machinery, 230-237.

[Ungar and Foster, 1998] Ungar, L. H., and D. P. Foster. 1998. Clustering Methods for Collaborative Filtering. Paper read at Proceedings of the Workshop on Recommendation Systems.

[Velez & Santos, 2006] Velez, O., C. Santos. 2006. Sistemas Recomendadores: Un enfoque desde los algoritmos genéticos. Industrial data, año/vol 9, número 001. Universidad Nacional Mayor Dan Marcos, Lima, Perú. 23-31.

Redes de Colaboración Científica para la Investigación Académica

En el siglo XXI y en los sucesivos siglos, la colaboración será fundamental para llevar a cabo proyectos de gran envergadura en cualquier ámbito y en especial cuando se trata de proyectos en ciencia y tecnología.

Tradicionalmente las instituciones de educación superior de la República Dominicana, han trabajado bajo un espíritu de competencia. Fenómeno que responde a la relativa juventud de la mayoría de las universidades, las cuales se encuentran en una etapa de consolidación y posicionamiento a nivel nacional y regional. Esta necesidad de diferenciación inicial, genera poderosas barreras de integración y de movilidad interinstitucional.

En la realidad dominicana la competencia dentro de los mercados de educación superior es, ante todo, de carácter posicional, en un doble sentido. En los niveles más altos, las instituciones compiten por los estudiantes más preferidos y los estudiantes compiten por oportunidades prestigiosas (Instituciones con alta reputación, selectivas, alta calidad, etc.) En cambio, a medida que se desciende en la jerarquía de las instituciones, la competencia adquiere un sentido distinto y se transforma, básicamente, en competencia por matrícula, y no calidad. En la parte más baja del mercado, las universidades dominicanas ya no pueden preocuparse de a quien ofrecen su servicio sino que deben actuar con una política de puertas abiertas y competirán simplemente por captar alumnos.

No obstante, las exigencias de los modelos educativos de las sociedades que intentan insertarse en las corrientes del desarrollo, demandan la generación de conocimiento a través de la investigación y las sinergias que produce una institución de múltiples propósitos y múltiples relaciones.

Las investigaciones en la República Dominicana están esparcidas a lo largo de las diferentes instituciones de educación superior y centros de investigación donde los especialistas trabajan de manera individual (islas científicas), privándose de la oportunidad de compartir logros y resultados que al aunar esfuerzos podrían generar aportes significativos al desarrollo de la ciencia y la técnica en el país. 

A pesar de las oportunidades que brinda el escenario nacional para acceder a conocimientos avanzados, desarrollar estrategias y participar en proyectos de investigación, existen algunos aspectos que impiden un desarrollo adecuado de las líneas de investigación en el país. El principal problema que debemos resolver para poder integrarnos al avance actual de la ciencia y la tecnología, es la colaboración interna. De ahí la conveniencia de aunar esfuerzos propios en la producción de conocimientos científicos y tecnológicos. 

Como lo ha planteado el Ministerio de Educación Superior Ciencia y Tecnología, la creación de redes de investigación y desarrollo es fundamental para potenciar el impacto de los resultados obtenidos de los proyectos de investigación, de ahí la importancia de la creación de herramientas que faciliten la vinculación y creación de estos grupos de profesionales con “intereses comunes y/o complementarios” en la investigación.

Nos encontramos ante una gran oportunidad de avanzar en los campos de la ciencia y la tecnología en el plano nacional y regional. El gran reto es como convertir en conocimiento útil la avalancha de información que se propaga a través de los diferentes medios de comunicación y como aprovechar el proceso de generación y apropiación del conocimiento para inducir procesos dinámicos de cambio social, a través de los cuales el conocimiento crea y fortalece capacidades y habilidades de personas u organizaciones que se lo apropian, convirtiéndose en un factor de cambio. Más importante aún, como aunar esfuerzos aislados en función de un interés científico colectivo que permita logros de mayor impacto a través de la colaboración interinstitucional.

En este sentido, la conformación de redes científicas mediante redes digitales de información está llamada a jugar un papel importante en los procesos de generación y apropiación del conocimiento.

Está demostrado que la vinculación de instituciones con actividad investigativa a través de redes, permite uno de los mayores flujos de cooperación e intercambio de información. Fomentar la creación de redes académicas de investigación en el ámbito nacional, introduce un componente dinámico que favorece las interacciones entre los diferentes actores. Estas redes proporcionan un mecanismo ideal para aquellos actores que se encuentran aislados, incluso en regiones con menor desarrollo científico, tecnológico o social. Este problema de asimétrica distribución de capacidades científicas y tecnológicas está presente en mayor o menor medida en todo el entorno regional. Por eso las redes que localicen y asocien individuos con intereses comunes, constituyen una alternativa para aliviar este problema. Especialmente en países como República Dominicana, donde existe una masa crítica insuficiente y debilidades en los grupos de investigación y desarrollo.

Este tipo de soluciones, permite no solo aprovechar la masa crítica existente, sino además, desarrollar sinergias derivadas de la colaboración entre grupos de investigación para abordar temas y proyectos de mayor envergadura y complejidad, mayor impacto científico, tecnológico, económico y social.