We propose a new unsupervised approach to identify Named Entities (NE) in resource-poor languages using bilingual parallel corpora (of a resource poor language and a resource rich language). In our research, we have selected Farsi as resource poor and English as resource rich language according to their features to be a good representation for other languages. We use lexical and contextual filters to identify NEs in both languages. In our filtering part, we present a new distance function called M-distance to compute edit distance between a Latin and a non-Latin script. To further improve, we use bootstrapping method via graph propagation. Our final result for Farsi (without using specific features of Farsi) is 0.74 F1 measure.
Named Entity Recognition,Machine Learning,Natural Language Processing,NLP,Graph Propagation, Farsi, Persian