How can take advantage of GIS and Big Data?



Practically everything has an explicit or implicit geospatial location. everything that happens, happens somewhere. It’s the where that provides context.

Big Data is junk if you can’t understand it, but a map is a pattern made understandable.

What’s getting better is our ability to model and forecast our cultural, physical, and biological futures. The difference is real-time data. You have sensors pulling in information about traffic, noise, air pollution, water quality — even conversations on Twitter. GIS has always been about data, but now GIS is getting filled with streams of real-time data. We’re able to integrate that data from different sources and analyze it against historical patterns to make predictions.

Big Data can easily become unmanageable and useless without the proper tools to analyze it fast. How can this be managed?

In most organizations, data goes to “die” on USB drives. What I mean is that the traditional storage mechanism that today’s organizations use cannot handle the massive amounts of data they are flooded with. So what do they do? They move their data to USB drives to make room for new data and then shelve the USB drive with the idea that one day the data will be restored to their traditional storage, but it never is. The operative word here is “traditional” and that is very important.

For me understanding Big Data usage is more than the traditional volume, velocity and variety (3 Vs); it’s the realization that sometimes the 3 Vs just don’t apply. For example, imagine (God forbid) that another Fukushima Daiichi event happens, the sparse data from sensors in a remote village is not coming in fast, is not big and is very structured. However, the window of opportunity to respond to such an event is so small that I need new ways to determine whether that village needs to be evacuated, and I need certainty and confidence in my decision making. This is where geospatial analytics is most relevant in the form of bayesian kriging with regression executed very quickly by non-traditional means.

Here’s another example of non-traditional data management: An organization needed to meet the requirements of a service-level agreement (SLA) for storing online all data for a specific amount of time so that any geospatial data collected during that time can be immediately visualized on a map. A traditional data storage vendor could have helped that organization meet the SLA requirements, but at an exorbitant cost! The organization decided to take a bold step and try something new, Big Data (i.e., Hadoop), to store and process that information. Again, it’s not about volume, velocity, and variety — it is simply the cost of doing business. Hadoop provides a way to deal with Big Data issues. Lately, I’ve been combining it with other tools such as Cassandra, Elasticsearch, and Apache Spark to surpass the traditional means.

What are the software and tools for the Big Data roadmap?

The ArcGIS platform continues to evolve in a number of areas. Here are some highlights:

First, It has been published new extension Big Data for GeoAnalytics, an extension for ArcGIS for Server that leverages a new class of technologies including distributed computing and storage frameworks to analyze and visualize very large datasets. Examples include the analysis and visualization of large volumes of real-time streaming data (e.g. data from moving vehicles/GPS sensors, connected devices, and social media events), enabling batch analytics on high-volume spatiotemporal data as well as raster analytics on very large collections of imagery.

ESRI introducing capabilities across the ArcGIS platform that make it easy for our users to take advantage of this new approach and scale it to their needs. The combination of ArcGIS GeoEvent Extension for Server and GeoAnalytics functionality will support high-velocity, real-time data ingestion; high-volume storage; and real-time and batch analytics on the same data. The combination of imagery and GeoAnalytics functionality will support data dissemination, on-the-fly analysis, and batch analysis for large collections of imagery gathered by drone, aerial, and satellite sensors.

GeoEvent Extension is being enhanced to support high-velocity data ingestion and streaming, handling hundreds of thousands of events per second in a similar infrastructure. To support spatiotemporal archiving, analysis, and visualization, we will also introduce a bundled spatiotemporal Big Data store in ArcGIS for Server based on distributed storage technology. This new data store will scale in capacity and throughput, leveraging additional infrastructure. In the area of large data visualization

In conclusion, ArcGis can store massive geospatial data on less expensive hardware and scale horizontally to handle more volume. It can also process and analyze data on a distributed environment via Spark cluster it converting more of geoprocessing tasks from serial processing on a single node model to parallel processing on a multiple-nodes model.

Finally, we should taking advantage of in-browser advanced capabilities, such as 3D and local GPU processing, to render that massive data in a fluid and expressive way.

No comments:

Post a Comment

Pages