My portfolio fetches NASA's Daily Space Photo - and never fails!
I integrated NASA's Astronomy Picture of the Day *([read about it](https://api.nasa.gov/))* into my portfolio. > * *SPOILER ALERT: Contains rate limiting, fallback scraping, modular architecture, and production-grade error handling that never leaves users hanging.* --- ## The Vision: Bringing Space to My Portfolio My portfolio ([luisfaria.dev](https://luisfaria.dev)) runs a full-stack MERN application with authentication, a chatbot, and a GraphQL API. I wanted to add something unique — something that would genuinely delight users while showcasing real-world API integration skills. Between terms of my Master's Degree, I had a few weeks off. Perfect vacation project, right? BTW, I'm open-sourcing the whole thing — check it out! [mastersSWEAI repo](https://github.com/lfariabr) **The idea:** A floating action button that reveals NASA's daily Astronomy Picture of the Day (APOD). Simple concept, complex execution. ### The User Experience  *Click the NASA rocket button: 👉 [luisfaria.dev](https://luisfaria.dev)* - **Anonymous users:** Get today's APOD instantly — no login required - **Authenticated users:** Browse NASA's entire archive dating back to 1995 - **Rate limiting:** 5 requests/hour per user to protect the NASA API quota - **Resilience:** If NASA's API fails, automatic HTML scraping fallback kicks in Here's exactly what happens when someone clicks that rocket button:  > 👉 *[See the image in HD](https://github.com/lfariabr/luisfaria.dev/tree/master/_docs/devTo/t1_2026/img/apod_flow.jpeg)* --- ## The Challenge: External APIs Are Unreliable Integrating third-party APIs sounds straightforward — until reality hits: | NASA API Reality | Production Requirements | |-----------------|------------------------| | **Rate limits** (1000 req/day) | Must protect quota, gracefully throttle users | | **504 Gateway Timeouts** | Can't show users blank screens | | **Validation issues** | NASA sometimes returns `media_type: "other"` with no `url` field | | **Network failures** | ETIMEDOUT, connection refused, DNS issues | | **Schema drift** | NASA API evolves independently of your code | **The goal:** Build an integration that: 1. Handles every failure mode gracefully 2. Never crashes the server 3. Falls back automatically when NASA is down 4. Logs everything for debugging 5. Provides structured errors to clients Spoiler: NASA's API went down during development. More than once. --- ## The Architecture: Layered Resilience Here's the system I designed: ``` Browser (Next.js/React) ↓ GraphQL API (Apollo Server) ↓ APOD Service Layer ├──→ NASA API (primary, with retries + timeout) └──→ HTML Scraping Fallback (when API fails) ↓ Redis Rate Limiter (atomic Lua scripts) ↓ MongoDB (cache successful responses) ``` ### Key Architectural Decisions (3 of them!) **1. GraphQL Shield for Authorization** - `getTodaysApod` is public (no login) - `getApodByDate` requires authentication (prevents abuse) **2. Modular Service Design** ``` src/services/apod/ ├── index.ts # Barrel export ├── apod.service.ts # Orchestrator (API + fallback) ├── apod.api.ts # NASA API client ├── apod.fallback.ts # HTML scraping fallback ├── apod.errors.ts # Typed error codes ├── apod.types.ts # Zod schemas, TypeScript types └── apod.constants.ts # URLs, timeouts, retry config ``` **3. Shared Error Handling Infrastructure** Instead of copy-pasting try/catch blocks across every resolver (we've all been there), I built a reusable error handler: ```typescript // src/utils/errors/graphqlErrors.ts export function createErrorHandler<TCode, TError>( mapErrorCode: (code: TCode) => ErrorCode, isServiceError: (error: unknown) => error is TError, defaultMessage: string ) { return function withErrorHandling<T>( fn: () => Promise<T>, operationName: string ): Promise<T> } ``` Now any service can use it: ```typescript // APOD resolver (34 lines total) export const ApodQueries = { getTodaysApod: async (_, __, context) => withApodErrorHandling( () => fetchApod({ context: { userId: context.user?.id } }), 'getTodaysApod' ), getApodByDate: async (_, args, context) => { if (!context.user) { throw Errors.unauthenticated('Authentication required'); } return withApodErrorHandling( () => fetchApod({ date: args.date, context: { userId: context.user.id } }), 'getApodByDate' ); }, }; ``` --- ## The Journey: 8 Issues, 40+ Commits, 1 Production Feature This didn't work on the first try. Or the fifth. Here's the honest implementation timeline: > **Tracked in:** [Epic v2.4 - APOD Feature](https://github.com/lfariabr/luisfaria.dev/blob/master/_docs/featureBreakdown/v2.4.Apod.MD) > [All 40+ commits to (apod) feature](https://github.com/search?q=repo%3Alfariabr%2Fluisfaria.dev+++apod&type=commits&s=committer-date&o=desc) ### Phase 1: Foundation (Issues #61-65) **Frontend: NASA-Branded Floating Action Button** Built `ApodFab.tsx` following the same pattern as the existing `GogginsFab` component: - Circular button with NASA gradient border (`linear-gradient(135deg, #0B3D91, #FC3D21, #1E90FF)`) - Rocket icon with blue pulse aura effect - Radix UI tooltip: "Astronomy Picture of the Day" - Accessible (ARIA labels, keyboard navigation) - Light/dark mode support **Frontend: APOD Dialog Component** Created `ApodDialog.tsx` with: - Date display with calendar icon - Image/video player (handles both media types) - Copyright attribution - External link to NASA APOD website - "Powered by NASA Open APIs" footer **Backend: Configuration & Validation** Set up NASA API credentials: ```typescript // backend/src/config/config.ts interface Config { nasaApiKey: string; } const requiredEnvVars = ['NASA_API_KEY', ...]; ``` Server refuses to start without `NASA_API_KEY` — fail fast, no silent surprises. --- ### Phase 2: NASA API Client (Issue #66) **Zod Schema for Runtime Validation** NASA's API returns JSON, but not all fields are guaranteed: ```typescript // src/validation/schemas/apod.schema.ts export const apodResponseSchema = z.object({ copyright: z.string().optional(), date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/), explanation: z.string().min(1), media_type: z.enum(['image', 'video', 'other']), // 'other' was missing initially! title: z.string().min(1), url: z.string().url().optional(), // Not provided for media_type: "other" hdurl: z.string().url().optional(), apod_url: z.string().url().optional(), // Computed field }); export type ApodResponse = z.infer<typeof apodResponseSchema>; ``` **NASA API Service with Retries** Built `apod.api.ts` with: - Exponential backoff retries (3 attempts) - 8-second timeout per request - AbortController for proper cleanup - Structured logging (latency, status code, userId) ```typescript export async function fetchApodFromApi( url: string, context?: ApodRequestContext ): Promise<ApodResponse> { const startTime = Date.now(); const controller = new AbortController(); const timeoutId = setTimeout(() => controller.abort(), TIMEOUT_MS); try { const response = await fetch(url, { signal: controller.signal, headers: { 'User-Agent': 'luisfaria.dev/1.0' }, }); if (!response.ok) { throw new ApodServiceError( `NASA API error: ${response.status}`, response.status === 429 ? 'RATE_LIMITED' : 'NASA_API_ERROR', response.status ); } const data = await response.json(); const validated = apodResponseSchema.parse(data); logger.info('NASA API request successful', { latencyMs: Date.now() - startTime, date: validated.date, userId: context?.userId, }); return validated; } catch (error) { // Error mapping logic... } finally { clearTimeout(timeoutId); } } ``` --- ### Phase 3: The Hard Part — Failures & Fallbacks (***aka scars earned***) This is where production engineering got real. Here's every bug I hit: | # | Problem | Root Cause | Solution | |---|---------|------------|----------| | 1 | **Validation failures on `media_type: "other"`** | Zod schema only accepted `'image' \| 'video'` | Added `'other'` to enum, made `url` optional | | 2 | **504 Gateway Timeout from NASA** | NASA API occasionally unresponsive | Implemented HTML scraping fallback | | 3 | **`url` field missing for interactive content** | NASA doesn't provide `url` for SDO videos/embeds | Added `apod_url` (computed from date) as fallback | | 4 | **Resolver error handling duplication** | Try/catch boilerplate in every resolver | Extracted shared `createErrorHandler()` utility | | 5 | **Inconsistent error codes between services** | Each service used different error mapping | Created `ErrorCodes` constant as single source of truth | | 6 | **Rate limit bypass by unauthenticated users** | Anonymous users shared the same Redis key | Switched to session-based rate limiting for anonymous users | | 7 | **Tests breaking after modular refactor** | Tests imported from old monolithic `apod.ts` | Rewrote mocks to match new module structure | | 8 | **NGINX 502 after deploying APOD feature** | Container DNS caching after recreation | Added `nginx -s reload` to CI/CD pipeline | **Bug #2 was the game-changer.** When NASA's API returned 504, users saw blank screens. Not acceptable. The fix: automatic HTML scraping fallback — if the API is down, scrape the website directly. --- ### Phase 4: HTML Scraping Fallback (Issue #78) When the NASA API fails, the service automatically scrapes the official APOD website. Users never know the difference: ```typescript // src/services/apod/apod.fallback.ts export async function fetchApodHtmlFallback( date?: string ): Promise<ApodResponse> { const url = date ? `https://apod.nasa.gov/apod/ap${formatDateForApodUrl(date)}.html` : 'https://apod.nasa.gov/apod/astropix.html'; const html = await fetch(url).then(res => res.text()); const $ = cheerio.load(html); // Parse structured data const title = $('center:first b:first').text().trim(); const explanation = $('center:first p:last').text().trim(); const imageUrl = $('center:first img').attr('src'); return { date: date || new Date().toISOString().split('T')[0], title, explanation, url: imageUrl, media_type: 'image', apod_url: url, // ... rest of fields }; } ``` **Orchestration in `apod.service.ts`:** ```typescript export async function fetchApod(options = {}): Promise<ApodResponse> { try { return await fetchApodFromApi(buildApiUrl(options), options.context); } catch (error) { if (shouldFallback(error)) { logger.warn('NASA API failed, falling back to HTML scraping', { error }); return await fetchApodHtmlFallback(options.date); } throw error; } } ``` Users never see errors — they just get the APOD, regardless of which method worked. That's the whole point. --- ### Phase 5: Shared Error Handling Infrastructure (Issue #79) **Before refactor:** Each resolver had 30+ lines of try/catch boilerplate. Copy-paste engineering at its worst. **After refactor:** - Created `src/utils/errors/graphqlErrors.ts` with reusable utilities - Error factories for common cases: `Errors.unauthenticated()`, `Errors.forbidden()`, `Errors.notFound()` - Generic `createErrorHandler()` wrapper generator - Service-specific error mappers (e.g., `withApodErrorHandling`) **Impact:** - Resolvers went from 103 lines to 34 lines - Single place to add new error codes - Error mapping lives with service logic (where it belongs) - Other features can reuse the same pattern — and they already do --- ## Key Engineering Lessons Five production-grade patterns I learned (the hard way) from building APOD: ### 1. Always Have a Fallback External APIs fail. Network timeouts happen. DNS breaks. If your feature depends on a third-party service, you need a backup plan — full stop: - **Primary:** NASA JSON API (fast, structured) - **Fallback:** HTML scraping (slower, but always works) - **User experience:** Seamless — they never know which method was used ### 2. Validate External Data at Runtime TypeScript types don't protect you against API changes. NASA's schema evolved mid-development — they added `media_type: "other"` for interactive content, which broke my Zod schema mid-sprint. **Solution:** Runtime validation with Zod catches schema drift before it crashes the server. ```typescript const validated = apodResponseSchema.parse(data); // Throws if schema mismatch ``` ### 3. DRY Principle for Error Handling Don't duplicate try/catch blocks across resolvers. We've all done it. It's technical debt from day one. Extract shared error handling into reusable utilities: ```typescript // Before: 30 lines of boilerplate per resolver // After: 3 lines + shared error handler return withApodErrorHandling( () => fetchApod({ date: args.date, context }), 'getApodByDate' ); ``` ### 4. Modular Services Are Testable Services Splitting the monolithic `apod.ts` into focused modules made testing trivial — and debugging even more so: ``` src/services/apod/ ├── apod.service.ts # Orchestration (API + fallback) ├── apod.api.ts # NASA API client ├── apod.fallback.ts # HTML scraping ├── apod.errors.ts # Typed errors ├── apod.types.ts # Zod schemas └── apod.constants.ts # Config ``` Each module has a single responsibility. Tests mock at the module boundary, not the entire service. ### 5. Log Everything for Observability Every NASA API request logs: - Latency (`latencyMs`) - User context (`userId`) - Success/failure status - Error codes and details When bugs happen in production (and they will), structured logs are your debugging lifeline. ```typescript logger.info('NASA API request successful', { latencyMs: 142, date: '2026-02-18', userId: 'user_xyz', }); ``` --- ## Results | Metric | Implementation | |--------|---------------| | **Uptime** | 99.9% (fallback handles NASA API downtime) | | **Response time** | <500ms (NASA API), ~1.2s (HTML fallback) | | **Error rate** | 0.1% (network failures only, auto-recovered) | | **Rate limit protection** | 5 req/hr per user (Redis atomic counters) | | **Test coverage** | 94% (28 passing unit tests) | | **Lines of code** | 1,200 (including tests) | | **GraphQL queries** | 2 (`getTodaysApod`, `getApodByDate`) | | **Fallback success rate** | 100% (HTML scraping never failed in production) | ### Real-World Reliability During a 72-hour period where NASA's API had intermittent 504 errors: - **Primary API success rate:** 78% - **Fallback activation:** 22% - **User-facing errors:** 0% Users never knew NASA's API was struggling. The fallback handled it seamlessly — that's the whole point of building resilient systems. --- ## Tech Stack | Layer | Technology | Purpose | |-------|------------|---------| | **Frontend** | Next.js 16 + React 19 | UI with floating action button + dialog | | **UI Library** | Radix UI + TailwindCSS 4 | Accessible components, NASA branding | | **Backend** | Node.js + Express + Apollo Server 5 | GraphQL API | | **Schema** | GraphQL + GraphQL Shield | Type-safe API with field-level authorization | | **Validation** | Zod | Runtime schema validation | | **API Client** | Fetch API + AbortController | HTTP with timeouts and retries | | **Scraping** | Cheerio | HTML parsing for fallback | | **Rate Limiting** | Redis + Lua scripts | Atomic counters per user | | **Database** | MongoDB | Cache successful APOD responses | | **Logging** | Winston | Structured logs for observability | | **Testing** | Jest + ts-jest | Unit tests with mocked services | --- ## Future Roadmap The current implementation is production-ready, but there's always room to grow. Here are 5 ideas — feel free to add yours in the comments! ### Idea #1: Database Caching Layer Right now, every request hits NASA's API (or HTML fallback). Next iteration: - Cache successful responses in MongoDB - Return cached APOD if date already fetched - Reduce API quota usage by 80% - Instant response for popular dates ### Idea #2: Admin Dashboard GraphQL mutations to manually refresh/delete cached APODs: ```graphql mutation RefreshApod($date: String!) { refreshApod(date: $date) { date, title } } ``` ### Idea #3: WebSocket Push Updates Use GraphQL subscriptions to push new APODs to connected clients when they become available at midnight UTC. ### Idea #4: Zero-Cold-Start: Daily Cron + Redis 24h Cache Right now, the first user of the day triggers a live NASA API call. That's ~200-500ms of cold latency — acceptable, but not great. The plan: a daily cron job fires at **00:01 UTC**, fetches today's APOD proactively, and stores it in **Redis with a 24h TTL**. Every subsequent request that day gets a cache hit — sub-10ms response, zero external calls. ```typescript // Pseudocode: src/jobs/apodDaily.ts export async function warmApodCache() { const today = new Date().toISOString().split('T')[0]; const cacheKey = `apod:${today}`; // Already warm? Skip. const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); // Fetch fresh from NASA const apod = await fetchApod({ date: today }); // Cache for exactly 24h (expires at midnight UTC) const secondsUntilMidnight = getSecondsUntilMidnightUTC(); await redis.setex(cacheKey, secondsUntilMidnight, JSON.stringify(apod)); logger.info('APOD cache warmed', { date: today, ttl: secondsUntilMidnight }); return apod; } ``` The cron schedule via `node-cron`: ```typescript // Fires at 00:01 UTC every day cron.schedule('1 0 * * *', warmApodCache, { timezone: 'UTC' }); ``` The resolver then checks Redis first before ever hitting NASA: ```typescript getTodaysApod: async (_, __, context) => { const today = new Date().toISOString().split('T')[0]; const cacheKey = `apod:${today}`; const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached); // ⚡ <10ms return withApodErrorHandling( // 🐌 200-500ms () => fetchApod({ context: { userId: context.user?.id } }), 'getTodaysApod' ); }, ``` **Expected impact:** | Scenario | Before | After | |----------|--------|-------| | First request of the day | ~300ms (live NASA call) | ~5ms (Redis hit) | | Subsequent requests | ~300ms (live NASA call) | ~5ms (Redis hit) | | NASA API unavailable | ~1.2s (HTML fallback) | ~5ms (Redis hit) | | NASA quota usage | 1 req per user visit | 1 req per day total | The key insight: Redis TTL auto-expires the cache exactly when it stops being valid. No manual invalidation. No stale data. Just *fast* for 99% of requests. ### Idea #5: Analytics Dashboard Track: - Most popular APOD dates - Fallback usage percentage - Average response time (API vs. fallback) - Rate limit triggers per user --- ## Key Takeaways Building production-grade API integrations is 20% "get it working" and 80% "handle when it doesn't work." Five principles that made APOD production-ready: 1. **Graceful degradation** — Fallbacks ensure users never see errors 2. **Runtime validation** — Zod catches schema drift before it crashes 3. **Modular architecture** — Focused modules are easier to test and maintain 4. **Shared error handling** — DRY principle for GraphQL resolvers 5. **Observability** — Structured logs make debugging trivial --- ## Try It Yourself The full APOD implementation is open source: | Resource | Link | |----------|------| | **Live Demo** | [luisfaria.dev](https://luisfaria.dev) — Click the NASA rocket button | | **GitHub Repo** | [github.com/lfariabr/luisfaria.dev](https://github.com/lfariabr/luisfaria.dev) | | **APOD Service** | [backend/src/services/apod/](https://github.com/lfariabr/luisfaria.dev/tree/master/backend/src/services/apod) | | **GraphQL Schema** | [backend/src/schemas/types/apodTypes.ts](https://github.com/lfariabr/luisfaria.dev/blob/master/backend/src/schemas/types/apodTypes.ts) | | **Frontend Component** | [frontend/src/components/apod/](https://github.com/lfariabr/luisfaria.dev/tree/master/frontend/src/components/apod) | | **Feature Spec** | [_docs/featureBreakdown/v2.4.Apod.MD](https://github.com/lfariabr/luisfaria.dev/blob/master/_docs/featureBreakdown/v2.4.Apod.MD) | --- ## Let's Connect! Building this NASA integration taught me more about production engineering than any tutorial could. Every failure mode I hit — 504 timeouts, schema drift, rate limits, DNS caching — is something I'll face again in enterprise systems. And now I know how to handle it. If you're working with: - GraphQL APIs and error handling patterns - Third-party API integrations with fallback strategies - Next.js + Node.js full-stack applications - Production-grade TypeScript architectures I'd love to connect and trade war stories: - **LinkedIn:** [linkedin.com/in/lfariabr](https://www.linkedin.com/in/lfariabr/) - **GitHub:** [github.com/lfariabr](https://github.com/lfariabr) - **Portfolio:** [luisfaria.dev](https://luisfaria.dev) --- **Tech Stack Summary:** | Current Implementation | Future Extensions | |----------------------|----------------------| | NASA API + HTML fallback, GraphQL Shield, Redis rate limiting, Zod validation, modular services, Winston logging, 94% test coverage | Redis 24h cache + daily cron warm-up, GraphQL subscriptions, admin mutations, analytics dashboard | --- *Built with ☕, 40+ commits, and a healthy fear of blank screens by [Luis Faria](https://luisfaria.dev)* > *Whether it's concrete or code, structure is everything.*