【2020 Application Example】 AI Address Parsing, No More Hitting Walls in Searching for Coordinates
Empower addresses with spatial coordinates to help drive the 'Open Data' policy
In recent years, the government has been promoting 'Open Data' hoping that the openness of data will facilitate inter-agency data flow, enhance administrative efficiency, meet public needs, and strengthen public oversight of the government. Among them, transportation data is closely related to daily life, often reported by the public with the incidents specifying obvious local landmarks or addresses; there have also been public feedback about the traffic reports on police radio that lacked actual coordinates. Introducing these addresses, which were originally without spatial attributes, into the geographical coordinate system is one step toward 'Smart Spatial Decision Making'.
However, unstructured addresses, without manual intervention to improve the inconsistency of address formats, do not yield high location accuracy, necessitating an improvement in data quality and usability to unlock the potential applications of open data. This further aids in policy promotion and widespread application to different sectors including tourism, employment, birth and adoption.
Unregulated and diverse writing styles of addresses lead to low location accuracy
Address Locator is jointly developed by SongXu Information Co., Ltd. and YanDing Intelligent Co., Ltd. GOLiFE as a 'stand-alone address locating software' providing single or batch address location services. To imbue address data with spatial attributes, the core technology of Address Locator involves 'Address Parsing' and 'Address Location' in two stages. Initially, 'Address Parsing' distributes the addresses aimed for positioning according to administrative region hierarchy keywords: province/city, township/district, village, road/street, alley, lane, number; subsequently, 'Address Location' matches the split addresses with the parent address to obtain the location level and corresponding coordinates.
However, in the actual business integration process, since address sources are maintained separately by different authorities, a lack of consistent standards remains a common issue. Problems include: special characters (at address examples in specific regions), omitted administrative units, repetitive administrative hierarchical keywords, special street-alley segments, mismatch in Chinese numericals vs. Arabic numerals, and non-current addresses leading to complex address formats that are difficult to accurately split.
Establishing an address tokenization model, achieving precise location alignment!
To effectively handle various messy address formats and alleviate the difficulties in location alignment for the existing Address Locator, AI and Natural Language Processing technologies are implemented for 'Address Normalization' and a 'Chinese Tokenization Tool' to optimize existing address location capability. 'Address Normalization' addresses the issues of missing keywords, variant character forms, and missing administrative areas; whereas 'Chinese Tokenization Tool' helps resolve 'split errors' caused by special address formats, preventing unsuccessful positioning.
▲ Successful address parsing through AI tokenization technology
In the past, while handling address location services, manual preprocessing for data standardization was required, hence it was not solely marketed as a product, but included in project plans that offered address location services. However, after incorporating address normalization and AI tokenization technology, it has become a complete product, significantly reducing the time users spend on manual adjustments and achieving the intended location accuracy. Furthermore, the AI-enhanced Address Locator is now introduced on the SongXu Information Co. Ltd. website, including product descriptions and official listings.
After four months of testing and modifications, AI technology was successfully incorporated into the existing address location product. From selecting the tokenization tools, building the corpus, training the model, and interfacing with product features, to complete test planning, collection from 'Government Data Open Platform' and 'Taichung City Government Data Open Platform,' including over 62 datasets and more than 300,000 addresses, achieving a complete match rate of 90.08% and a fuzzy match rate of 98%, greatly surpassing the original product in match rates and processing time!
To promote AI technology applications in the information services sector, the AI-enhanced address location service is positioned as a new solution and showcased on the SongXu company website; starting from product function introductions, explaining address regularization methods and address location features; subsequently, guiding potential customers to envision applicable scenarios including: decision analytics, precision marketing, and other applications. The product will aid various sectors’ data by assigning spatial information to addresses, delving into the context and trends of data in two-dimensional space.
▲ Address Location Solution
Providing spatial coordinates for attractions, intersections, and points of interest
Successful development and implementation of AI-enhanced products in companies focused on smart transportation systems in the domestic market revealed that, while effectively solving address location issues, they also recognized that descriptions of spatial information, beyond addresses inclusive. During their progress, integrating AI more broadly into 'Entity Recognition' is set to be an important future application not limited to address location. In an era of information overload, collecting data is straightforward; identifying keywords of interest is key. Future development directions aim to optimize these products and create more business opportunities!
「Translated content is generated by ChatGPT and is for reference only. Translation date:2024-05-19」