I have a dictionary in the following format, i split the different elements (where a comma(,) occured) using a split function and am now trying to extract the names from the list…i am trying to use regular expression but obviously am miserably failing being new to python… the names are in the following formats…
- firstname(space)last name
- name(space)name(space)name
- x.name
- x.y.name
- name(space) x.(space)(name)
where x and y represent the an name initial like J. for john etc.
also if you can guide me in removing the “\t” keeping other information intact would also be great.
any sort of help would be more than welcome…thank you all.
[[' I. Antonov', ' I. Antonova', ' E. R. Kandel', ' and R. D. Hawkins. Activity-dependent presynaptic facilitation and hebbian ltp are both required and interact during classical conditioning in aplysia. Neuron', ' 37(1):135--47', ' Jan 2003.'], ['\tSander M. Bohte ', ' Joost N. Kok', ' Applications of spiking neural networks', ' Information Processing Letters', ' v.95 n.6', ' p.519-520'], [' L. J. Eshelman. The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination. Foundations Of Genetic Algorithms', ' pages 265-283', ' 1990.'], ['Wulfram Gerstner ', ' Werner Kistler', ' Spiking Neuron Models: An Introduction', ' Cambridge University Press', ''], [' D. O. Hebb. Organization of behavior. New York: Wiley', ' 1949.'], [' D. Z. Jin. Spiking neural network for recognizing spatiotemporal sequences of spikes. Physical Review E', '69', ' 2004.'], ['Wolfgang Maass ', ' Christopher M. Bishop', ' Pulsed Neural Networks', ' MIT Press', ' '], ['Wolfgang Maass ', ' Henry Markram', ' Synapses as dynamic memory buffers', ' Neural Networks', ' v.15 n.2', ' p.'], [' H. Markram', ' Y. Wang', ' and M. Tsodyks. Differential signaling via the same axon of neocortical pyramidal neurons. Neurobiology', ' 95:5323--5328', ' April 1998.'], ['\t\tD. E. Rumelhart ', ' G. E. Hinton ', ' R. J. Williams', ' Learning internal representations by error propagation', ' Parallel distributed processing: explorations in the microstructure of cognition', ' vol. 1: foundations', ' MIT Press', ' Cambridge', ' MA', ' 1986 </a> \t\t\t\t\t\t\t\t\t'], ['\t J. D. Schaffer', ' L. D. Whitley', ' and L. J. Eshelman. Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Combinations of Genetic Algorithms and NeuralNetworks', ' 1992.', ' COGANN-92. International Workshop on', ' pages 1--37', ' Philips Labs.', ' Briarcliff Manor', ' NY', ' 6 Jun 1992.'], ['\t S. Song', ' K. D. Miller', ' and L. F. Abbott. Competitive hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neuroscience', ' 3(9):919--926', ' 2000.'], ['\t L. Watts. Event-driven simulation of networks of spiking neurons. Advances in Neural Information Processing Systems', ' 6:927--934', ' 1994.']]
It looks like you’re going to have to tailor this pretty heavily to your input. Because there are so many different words and constructs in the text you’re parsing, you’re probably not going to get 100% accuracy with the rules you create. Here’s an example, though, assuming your original input text is called
input_text(and I don’t think using the split() method is really all that useful, because the commas don’t just delimit names):You’d obviously want to write additional specific regexes for your vaious name types. This does a good job of finding names, but also comes up with a lot of false positives (
Information Processinglooks a lot like a name based on these rules). This should give you a starting point though.