Dr Shufan Yang

School of Technology

Dr Shufan Yang  is a lecturer in Computer Science, School of Mathematics and Computer Science, University of Wolverhampton. Before joining in Wolverhampton, she was working on projects related with neural network FPGA based hardware modelling at University of Ulster. She focuses on the development of methodologies to support the large scale simulation of Spiking Neural Networks (SNNs) on multiple FPGA platforms. She undertakes research in the general area of embedded system design with particular interests in bio-inspired and neuro-engineering and associated algorithm issues.

Shufan obtained her Ph.D in Computer Science from the University of Manchester (2010), supervised by Professor Steve Furber. Her Ph.D project was to implement a high-performance multiprocessor communication system for SpiNNaker chips that are used for the real-time modelling of large systems of spiking neurons. She did B.Sc. degree and M.Sc. degree in China, 1998 and 2003 respectively. She was working on embedded system projects for more than ten years. Major projects in which she had been involved have included handset on-access layer implementation under VxWorks and implementation of embedded firewall on PowerPC boards, IDS implementation on Intel network processor IXP420.

Neuro-Inspired Computational Model of Situated Cognition for Flexible Visual Object Detection and Tracking1.        

EXECUTIVE SUMMARY

1.1 Introduction

In computer image processing, visual tracking of moving objects and in clutter scene remains to be a challenge, both in terms of algorithms and computational efficiency. In this work, I propose a novel object tracking method and level-set method with a attractor neural field model. Specifically, unlike traditional method that only depends on colour histogram of objects, a level-set method is used to measure the weights of samples, and the attractor states in neural network dynamics to refine curve evolution during tracking.

1.2 Aim

To advance the next generation of visual object recognition and tracking system, the research goal is to incorporate neuro-inspired situated cognitive computational principles into a programmable hardware-based object recognition system using low-resolution digital cameras.    

1.3 Objectives

This first objective of this project is to offer a novel brain-inspired and computationally efficient neural model built on scalable reconfigurable devices with visual object recognition and monitoring functions. The second objective of this project is to compromise computational efficiency with biological realism for active vision, our modelling approach (e.g. Wong and Wang, 2006) lies somewhere between previous abstract connectionist models (Cope, 2012) and realistic but computationally intensive spiking neuronal network models (Haykin, 1999; Pearson et al., 2007; Yang and McGinnity, 2011). This will help reduce power consumption while maintaining core biological features. Furthermore, our depth perception inspired by recent neuroscience findings (Tsutsui et al. 2005; Orban, 2011; Huk, 2012) will complement 3-dimensional image processing in machine vision, which is still in its infancy (Davies, 2012). In the longer term, this work will also provide realistic implementation of visual attention, an important high-level visual cognition involving the fields of neuroscience, psychology, computer science and engineering.  

1.4 Methods

In this project, I apply twofold methodology based on situated cognition theory and computational modelling. The situated cognition theory states that knowing and doing are inseparable, and that knowledge is situated in activity connected to the environment. The advantages of situated cognition theory include providing flexible, rapid, and contextual-based processing of information, which can overcome some of the current computational limits in artificial visual systems. This project will make use of our current knowledge of the primate visual and oculomotor systems to provide a neural representation of situated cognition theory to be implemented in reconfigurable and programmable device for efficient visual object recognition and monitoring. Key brain regions involved in such specific cognitions include the thalamocortical and corticocortical systems.

1.5 Research design

In this project, I successfully demonstrate an integrated approach to accurately and robustly track moving human subjects in various natural visual scenes with occlusion and clutter. Importantly, the model is able to process, in real-time, a 640 by 480 video stream with 30 frames per second using only one low-power Xilinx Zynq-7000 System-on-chip platform. This proof-of-concept work shows the advantage of incorporating neuro-inspired features in solving image-processing problems with low power operation.

1.6 Summary findings

To summarize, I have integrated traditional mean shift tracking and level-set methods with dynamical neural field model and successfully solved, as proof-of-concept, various occluded and clutter problems when visually tracking moving object. Our integrated approach has also allowed implementation in low power system-on-chip platform, hence providing high potentials for multiple future applications.

 

2.  MAIN BODY

2.1 Introduction

In realistic visual scenes, moving objects may be partially or wholly occluded in the visual field (Nguyen and Smeulders, 2004). These occlusion issues in object tracking are generally addressed using some forms of prediction or estimation methods (Lee et al, 2014). For example, a common approach in visual tracking is to assume constant motion or acceleration to project the position of object from previous time frame to a new position during occlusion (Yilmaz, 2006). However, in realistic scenarios those assumptions can be violated due to background clutter and occlusion. Hence, it is not always possible to use sequential model for analysis to get good tracking accuracy.

To address those problems, depth image using distance information and multiple cameras using large field of view have been proposed (Szeliski, 2010). With the availability of off-the-shelf real-time depth recovery camera (such as Microsoft’s Xbox Kinect, Samsung’s Smart TV), multiple camera systems can demonstrate superior performance in object tracking (Helten et al, 2013). However, those systems assume that the objects’ tracks are present within each camera, while the observers need manually define the different camera views. The performances of the required algorithms depend greatly on whether the tracked object actually follow the estimated path and whether their moving speeds across cameras’ instantaneous field of view are sufficiently regular. Situations with object moving randomly in the non-overlapping region will be challenging. Moreover, those approaches require large quantity of training samples and complex camera calibration process (Nguyen and Smeulders, 2004).

Furthermore, the majority of designs for visual object tracking are highly specialised and involved computationally expensive real-time algorithms (Yilmaz et al., 2006).  Real-time performance and reliable tracking has to be achieved using high-performance microprocessors with the cost of high power consumption and manufacturing costs (García et al, 2014 ). To integrate into mobile and portable device platform, such as robotic and other automated applications, the conventional computing image processing can only be altered in very limited way.

2.2 Need for the research

We propose a mean shift based visual tracking framework using level-set active contour method with recurrent attractor neural network. A new target location in the current frame is calculated using mean shift procedure, which computes the translational offset of the target location in each frame (Comaniciu and Meer, 2002). Traditional mean shift tracker suffers inaccurate representation of object due to the constancy of the kernel bandwidth and limited feature space abstraction. This tracking in traditional mean shift tracker can become worse when occlusion happens. To overcome the inaccuracy in the presence of total occlusion, I combine level-set based active contour and colour histogram as an object representation. The advantage of level-set based active contour is effective controlling of topological changes, which is important for tracking moving objects in cluttered scene. However, the curve evolutions need to be reinitialised in each iteration of loop, which is computationally expensive. To overcome cumbersome numerical computation, the attractor state of neural network is included into the curve evolution that can improve the convergence principle of level-set based active contour. In other words, the attractor property in the network dynamics can be used to “store” the location of an occluded object over time to track it. The re-initialisation process will be switched on only when the network is not in the attractor state.

2.3 Methods

Our proposed integrated approach maintains the advantage of easy implementation of mean shift method while reducing the iterative computational cost during curve evolution.  Hence, the computational efficiency is greatly improved while maintaining the function of real-time tracking, as we demonstrate in our hardware implementation.

 

3.  FINDINGS

We successfully demonstrate our integrated approach to accurately and robustly track moving human subjects in various natural visual scenes with occlusion and clutter. Importantly, the model is able to process, in real-time, a 640 by 480 video stream with 30 frames per second using only one low-power Xilinx Zynq-7000 System-on-chip platform. This proof-of-concept work shows the advantage of incorporating neuro-inspired features in solving image-processing problems with low power operation.

Since the dynamical neural network can exhibit the winner-take-all behaviour, it can be used to selectively enhance one part of external input (Vanrullen and Thorps, 2001). Hence, this work may also be extended to visual search paradigm since the model can be modified into an accumulator type model that can robustly account for behavioural data in visual search experiments (Yang et al., 2012)

 

4.  SUMMARY

To summarize, we have integrated traditional mean shift tracking and level-set methods with dynamical neural field model and successfully solved, as proof-of-concept, various occluded and clutter problems when visually tracking moving object. Our integrated approach has also allowed implementation in low power system-on-chip platform, hence providing high potentials for multiple future applications.

 

5.  REFERENCES

Amari, S. I. (1975). Homogeneous nets of neuron-like elements. Biol. Cybern., 17(4), 211-220.

Amit, D. J. (1989) Modeling brain function: The world of attractor neural networks, Cambridge University Press.

Cavanagh, P., & Alvarez, G. A. (2005). Tracking multiple targets with multifocal attention. Trends Cogn. Sci., 9(7), 349-354.

Cossart, R., Aronov, D., & Yuste, R. (2003). Attractor dynamics of network UP states in the neocortex. Nature, 423(6937), 283-288.

Cehovin, L., Kristan, M., & Leonardis, A. (2013). Robust visual tracking using an adaptive coupled-layer visual model. IEEE Trans. Pattern Anal. Mach. Intell., 35(4), 941-953.

Cremers, D. (2006). Dynamical statistical shape priors for level set-based tracking. IEEE Trans. Pattern Anal. Mach. Intell., 28(8), 1262-1273.

Cremers, D. (2013). Shape Priors for Image Segmentation. In: Shape Perception in Human and Computer Vision, pp. 103-117, Springer, London.

Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5), 603-619.

Comaniciu, D., Ramesh, V., & Meer, P. (2000). Real-time tracking of non-rigid objects using mean shift. In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on (Vol. 2, pp. 142-149). IEEE.

Epanechnikov, V.A. (1969). "Non-parametric estimation of a multivariate probability density". Theory of Probability and its Applications 14: 153–158. doi:10.1137/1114019.

Frintrop, S., Rome, E., & Christensen, H. I. (2010). Computational visual attention systems and their cognitive foundations: A survey. ACM Trans. Appl. Percept., 7(1).

Geiger, D., Gupta, A., Costa, L. A., & Vlontzos, J. (1995). Dynamic programming for detecting, tracking, and matching deformable contours. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(3), 294-302.

García, G. J., Jara, C. A., Pomares, J., Alabdo, A., Poggi, L. M., & Torres, F. (2014). A Survey on FPGA-Based Sensor Systems: Towards Intelligent and Reconfigurable Low-Power Sensors for Computer Vision, Control and Signal Processing. Sensors, 14(4), 6247-6278.

García, G. M., Frintrop, S., & Cremers, A. B. (2013). Attention-Based Detection of Unknown Objects in a Situated Vision Framework. KI-Künstliche Intelligenz, 27(3), 267-272.

Helten, T., Müller, M., Seidel, H. P., & Theobalt, C. (2013). Real-time Body Tracking with One Depth Camera and Inertial Sensors. In: IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1105-1112.

Jin, J., Lee, S., Jeon, B., Nguyen, T. T., & Jeon, J. W. (2013). Real-time multiple object centroid tracking for gesture recognition based on FPGA. In: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication, article 80.

Kruger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., ... & Wiskott, L. (2013). Deep hierarchies in the primate visual cortex: What can we learn for computer vision?. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1847-1871.

Kritikakou, A., Catthoor, F., Kelefouras, V., & Goutis, C. (2013). Near-optimal and scalable intrasignal in-place optimization for non-overlapping and irregular access schemes. ACM Transactions on Design Automation of Electronic Systems (TODAES), 19(1), 4.

Lee, B. Y., Liew, L. H., Cheah, W. S., & Wang, Y. C. (2014, February). Occlusion handling in videos object tracking: A survey. In IOP Conference Series: Earth and Environmental Science (Vol. 18, No. 1, p. 012020). IOP Publishing.

Malamas, E. N., Petrakis, E. G., Zervakis, M., Petit, L., & Legat, J. D. (2003). A survey on industrial vision systems, applications and tools. Image Vis. Comput., 21(2), 171-188.

Mei Han, Wei Xu and Y. Gong. (2004) An algorithm for multiple object trajectory tracking. In: Proceedings of the IEEE Computer Society Conference, 1: 864-871.

Nuechterlein, K. H., Parasuraman, R., & Jiang, Q. (1983). Visual sustained attention: Image degradation produces rapid sensitivity decrement over time. Science, 220(4594): 327-329.

Neftci, E., Binas, J., Rutishauser, U., Chicca, E., Indiveri, G., & Douglas, R. J. (2013). Synthesizing cognition in neuromorphic electronic systems. Proc. Natl. Acad. Sci., 110(37), E3468-E3476. doi: 10.1073/pnas.1212083110.

Niyogi, R. K., & Wong-Lin, K. (2013). Dynamic excitatory and inhibitory gain modulation can produce flexible, robust and optimal decision-making. PLoS computational biology, 9(6), e1003099.

Pearson, M., Pipe, A.G., Mitchinson, B., Gurney, K., Melhuish, C., Gilhespy, I., and Nibouche, M. (2007) Implementing spiking neural networks for real-time signal processing and control applications: a model validated FPGA approach. IEEE Trans. Neural Netw., 18(5): 1472-1487.

Perez, P., Vermaak, J., & Blake, A. (2004). Data fusion for visual tracking with particles. In: Proceedings of the IEEE, 92(3), 495-513.

Rutishauser, U., & Douglas, R. J. (2009). State-dependent computation using coupled recurrent networks. Neural Comput., 21(2), 478-509.

Szeliski, R. (2010). Computer vision: algorithms and applications. Springer.

Sussman, M., Smereka, P., & Osher, S. (1994). A level set approach for computing solutions to incompressible two-phase flow. Journal of Computational physics, 114(1), 146-159.

Vanrullen, R., & Thorpe, S. J. (2001). The time course of visual processing: from early perception to decision-making. J. Cogn. Neurosci., 13(4), 454-461.

Xu, Y., & Chun, M. M. (2009). Selecting and perceiving multiple visual objects. Trends Cogn. Sci., 13(4), 167-174.

Xilinx, 2013 Xilinx Xpower Analyzer[Online]. Available: <www.xilinx.com/products>.

Yang, S., and McGinnity, T.M. (2011) A biologically plausible real-time spiking neuron simulation environment based on a multiple-FPGA platform. ACM SIGARTH Computer Architecture News, Vol.39, Issue 4.

Yang S., McGinnity T. M., and Wong-Lin, K. (2012). Adaptive Proactive Inhibitory Control for Embedded Real-time Applications. Front. Neuroeng. 5:10. doi: 10.3389/fneng.2012.00010.

Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys (CSUR), 38(4), 13.